SlideShare a Scribd company logo
1 of 21
Digital Preservation in the era of Big Data
The DIACHRON Platform, archiving and querying linked open data
George Papastefanatos
gpapas@imis.athena-innovation.gr
“Athena” Research & Innovation Center
Panel discussion: Preparing for change
Acting on Change Conference : New Approaches and Future Practices in LTDP
London, Dec 2016
Data Explosion
BIG DATA GENERATE SIGNIFICANT FINANCIAL VALUE ACROSS SECTORS.
Data on the Web
 Global data space
 Connecting data from
diverse domains and
sources
 Primary objects:
(description of) “entities”
 Links between “entities”
 Info granularity: from
entire data collections to
atomic data
Interrelated, Heterogeneous
Adapted from Chris Bizer, Richard Cyganiak, Tom Heath,
available at http://linkeddata.org/guides-and-tutorials
Web of Data
Conceptual Representation
entity entity entity
entity
Typed Links Typed Links Typed Links
Spreadsheets
HTMLXMLRDFa
URIs
URIs
URIs
URIs
URIs
URIs
represent
SemiStructuredTriplesStatistical
represent represent represent
Web of world things described by Web of data
Data Web Evolution
 Explosion of data volume published
on web and diversity of sources
 Government
 Scientific
 Corporate
 Crowd-sourced
 Linked Open Data (LOD)
continuously published
 Currently data.gouv.fr lists
350,000 datasets,
data.gov.uk has 8,200 datasets.
Current Status
2007
2009
2011
Statistics
Datasets#: 1014
Social web 51.28%
Government 18.05%
Publications 9.47%
Life sciences 8.19%
User-generated content 4.73%
Cross-domain 4.04%
Media 2.17%
Geographic 2.07%
Rapidly Evolving Ecosystem
Mid 2014
http://lod-cloud.net/
Big Data Preservation is Challenging
Emerging Application Domains
2020: digital data production > 40 zetabytes =
5,200 Gbytes for every person on the planet
WIRED – 09/10/2014
Effective & efficient techniques to manage the data lifecycle
Appraisal
Integration
ArchivingProducing
Publishing
Linking
DIACHRON Approach
Publishing and preservation of data performed together
Archiving and dissemination are synonymous.
DIACHRON Model
Dataset Model
Dataset Model
[M1-M12] – Task 1.5Diachronic
Dataset D1
D1(t1) D1(t2) D1(t3)…………. D1(tm)
t1 t2 t3 t4 ………….
time
Change
Set
D1(t1,t2)
Change
Set
D1(t1,t2)
Time-Agnostic
Space
Time-Aware
Space
Record_1
Record
Atts
subject
predicate
“6 Artemidos st.”
Resource_a
(D1,tm)
“vcard:has
Address”
object
RecordSet
(tm)
Schema(tm)
Data Space Curated Information Space
Record_2
Record
Atts
predicate
“John Doe”
foaf:na
me
object
subject
D1(tn)
Resource_a
(D1,tn)
Record_i
Record
Atts
subject
Resource
Changes
(D1,tm,tn)
Change Set
D1(t1,t2)
Resource_a
(D2,tk)
………….
Record and
Schema
changes
Diachronic
Resource b
owl:sameAs
Diachronic
Resource a
Diachronic
Dataset D2
Dataset Model
[M1-M12] – Task 1.5Diachronic
Dataset D1
D1(t1) D1(t2) D1(t3)…………. D1(tm)
t1 t2 t3 t4 ………….
time
Change
Set
D1(t1,t2)
Change
Set
D1(t1,t2)
Time-Agnostic
Space
Time-Aware
Space
Record_1
Record
Atts
subject
predicate
“6 Artemidos st.”
Resource_a
(D1,tm)
“vcard:has
Address”
object
RecordSet
(tm)
Schema(tm)
Data Space Curated Information Space
Record_2
Record
Atts
predicate
“John Doe”
foaf:na
me
object
subject
D1(tn)
Resource_a
(D1,tn)
Record_i
Record
Atts
subject
Resource
Changes
(D1,tm,tn)
Change Set
D1(t1,t2)
Resource_a
(D2,tk)
………….
Record and
Schema
changes
Diachronic
Resource b
owl:sameAs
Diachronic
Resource a
Diachronic
Dataset D2
Dataset Model
[M1-M12] – Task 1.5Diachronic
Dataset D1
D1(t1) D1(t2) D1(t3)…………. D1(tm)
t1 t2 t3 t4 ………….
time
Change
Set
D1(t1,t2)
Change
Set
D1(t1,t2)
Time-Agnostic
Space
Time-Aware
Space
Record_1
Record
Atts
subject
predicate
“6 Artemidos st.”
Resource_a
(D1,tm)
“vcard:has
Address”
object
RecordSet
(tm)
Schema(tm)
Data Space Curated Information Space
Record_2
Record
Atts
predicate
“John Doe”
foaf:na
me
object
subject
D1(tn)
Resource_a
(D1,tn)
Record_i
Record
Atts
subject
Resource
Changes
(D1,tm,tn)
Change Set
D1(t1,t2)
Resource_a
(D2,tk)
………….
Record and
Schema
changes
Diachronic
Resource b
owl:sameAs
Diachronic
Resource a
Diachronic
Dataset D2
Dataset Model
[M1-M12] – Task 1.5Diachronic
Dataset D1
D1(t1) D1(t2) D1(t3)…………. D1(tm)
t1 t2 t3 t4 ………….
time
Change
Set
D1(t1,t2)
Change
Set
D1(t1,t2)
Time-Agnostic
Space
Time-Aware
Space
Record_1
Record
Atts
subject
predicate
“6 Artemidos st.”
Resource_a
(D1,tm)
“vcard:has
Address”
object
RecordSet
(tm)
Schema(tm)
Data Space Curated Information Space
Record_2
Record
Atts
predicate
“John Doe”
foaf:na
me
object
subject
D1(tn)
Resource_a
(D1,tn)
Record_i
Record
Atts
subject
Resource
Changes
(D1,tm,tn)
Change Set
D1(t1,t2)
Resource_a
(D2,tk)
………….
Record and
Schema
changes
Diachronic
Resource b
owl:sameAs
Diachronic
Resource a
Diachronic
Dataset D2
Dataset Model
[M1-M12] – Task 1.5Diachronic
Dataset D1
D1(t1) D1(t2) D1(t3)…………. D1(tm)
t1 t2 t3 t4 ………….
time
Change
Set
D1(t1,t2)
Change
Set
D1(t1,t2)
Time-Agnostic
Space
Time-Aware
Space
Record_1
Record
Atts
subject
predicate
“6 Artemidos st.”
Resource_a
(D1,tm)
“vcard:has
Address”
object
RecordSet
(tm)
Schema(tm)
Data Space Curated Information Space
Record_2
Record
Atts
predicate
“John Doe”
foaf:na
me
object
subject
D1(tn)
Resource_a
(D1,tn)
Record_i
Record
Atts
subject
Resource
Changes
(D1,tm,tn)
Change Set
D1(t1,t2)
Resource_a
(D2,tk)
………….
Record and
Schema
changes
Diachronic
Resource b
owl:sameAs
Diachronic
Resource a
Diachronic
Dataset D2
Dataset Model
[M1-M12] – Task 1.5Diachronic
Dataset D1
D1(t1) D1(t2) D1(t3)…………. D1(tm)
t1 t2 t3 t4 ………….
time
Change
Set
D1(t1,t2)
Change
Set
D1(t1,t2)
Time-Agnostic
Space
Time-Aware
Space
Record_1
Record
Atts
subject
predicate
“6 Artemidos st.”
Resource_a
(D1,tm)
“vcard:has
Address”
object
RecordSet
(tm)
Schema(tm)
Data Space Curated Information Space
Record_2
Record
Atts
predicate
“John Doe”
foaf:na
me
object
subject
D1(tn)
Resource_a
(D1,tn)
Record_i
Record
Atts
subject
Resource
Changes
(D1,tm,tn)
Change Set
D1(t1,t2)
Resource_a
(D2,tk)
………….
Record and
Schema
changes
Diachronic
Resource b
owl:sameAs
Diachronic
Resource a
Diachronic
Dataset D2
DIACHRON Query Language
• Queries on archive catalog
• Lists of datasets
• Lists of versions of a given dataset
• Filtered based on temporal, provenance or other metadata criteria
• Queries on Data
• Retrieve part(s) of a dataset that match certain criteria.
• Longitudinal queries
• Retrieve part(s) of a dataset across multiple versions.
• Temporal (version based) criteria can be applied.
• Queries on Changes
• Retrieve changes between two concurrent versions.
• Limit results for specific type of changes (schema, data, etc.).
• Mixed Queries on Changes and Data
• Retrieve datasets or parts of datasets affected by specific changes
Requirements
Diachron Query language
• Extension of SPARQL
– SPARQL queries are valid DIACHRON queries
• DIACHRON graph model
– basis of the query language, e.g.
– <FROM DATASET>,<FROM CHANGES>, …
• Specific versions
– AT VERSION, AFTER VERSION,
BEFORE VERSION, BETWEEN VERSIONS
• Syntactic Sugar for graph patterns, e.g.
– RECORD (e.g. for record variable)
– RECATT
• Query results dereified
Overview
Archiving Strategies
• Versions Materialization
(query efficiency, space consuming)
• Changes (delta-based) Materialization
(space efficiency, poor query performance, update overhead)
• Versions & Changes Materialization
(vast space requirements update overhead)
1st approach
Archiving Strategies
• Hybrid Materialization
• Only major versions & and all changes (delta) are stored
• Balance between query performance & storage space
2nd approach
DIACHRON applications
The Pilots
Thank you!
www.diachron-fp7.eu
DIACHRON
• http://wwwdev.ebi.ac.uk/ols/beta/ontologies/go
• http://diachron.imis.athena-innovation.gr:8080/services/ui/
• https://www.youtube.com/channel/UCIzfRLHiuOz4ZgaSytAgP7
w
• https://twitter.com/diachron_fp7
@diachron_fp7
• https://github.com/diachron
Demos & Outreach

More Related Content

What's hot

Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...semanticsconference
 
Data discovery through federated dataset catalogs
Data discovery through federated dataset catalogsData discovery through federated dataset catalogs
Data discovery through federated dataset catalogsValeria Pesce
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
 
Providing Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case studyProviding Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case studyinscit2006
 
ModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex informationModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex informationSimon Roberts
 
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهWeb Standards School
 
Q-JSON - Reduced JSON schema with high Data Representation Efficiency
Q-JSON - Reduced JSON schema with high Data Representation EfficiencyQ-JSON - Reduced JSON schema with high Data Representation Efficiency
Q-JSON - Reduced JSON schema with high Data Representation Efficiencyiosrjce
 
Data Archiving and Processing
Data Archiving and ProcessingData Archiving and Processing
Data Archiving and ProcessingCRRC-Armenia
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Fabrizio Orlandi
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Fabrizio Orlandi
 

What's hot (12)

Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
 
Data discovery through federated dataset catalogs
Data discovery through federated dataset catalogsData discovery through federated dataset catalogs
Data discovery through federated dataset catalogs
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
 
Providing Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case studyProviding Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case study
 
ModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex informationModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex information
 
My2dw
My2dwMy2dw
My2dw
 
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
 
Q-JSON - Reduced JSON schema with high Data Representation Efficiency
Q-JSON - Reduced JSON schema with high Data Representation EfficiencyQ-JSON - Reduced JSON schema with high Data Representation Efficiency
Q-JSON - Reduced JSON schema with high Data Representation Efficiency
 
Data Archiving and Processing
Data Archiving and ProcessingData Archiving and Processing
Data Archiving and Processing
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021
 
Dspace OAI-PMH
Dspace OAI-PMHDspace OAI-PMH
Dspace OAI-PMH
 

Similar to Digital Preservation in the era of Big Data - The Diachron Platform - Acting on Change 2016

DIACHRON Project Overview
DIACHRON Project OverviewDIACHRON Project Overview
DIACHRON Project OverviewPRELIDA Project
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxfPhilippe Rocca-Serra
 
DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!dclsocialmedia
 
Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at DataverseMerce Crosas
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM4Science
 
Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresJiaheng Lu
 
Data mining-2
Data mining-2Data mining-2
Data mining-2Nit Hik
 
Environment Canada's Data Management Service
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management ServiceSafe Software
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representationsMarco Quartulli
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsJiaheng Lu
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 
Data Mining Presentation on Science Day 2023
Data Mining Presentation on Science Day 2023Data Mining Presentation on Science Day 2023
Data Mining Presentation on Science Day 2023SakshiTiwari490123
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
 

Similar to Digital Preservation in the era of Big Data - The Diachron Platform - Acting on Change 2016 (20)

DIACHRON Project Overview
DIACHRON Project OverviewDIACHRON Project Overview
DIACHRON Project Overview
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
 
DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!
 
Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at Dataverse
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM
 
mx & dbs
mx & dbsmx & dbs
mx & dbs
 
Data-Mining-2.ppt
Data-Mining-2.pptData-Mining-2.ppt
Data-Mining-2.ppt
 
Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated Polystores
 
Data mining-2
Data mining-2Data mining-2
Data mining-2
 
Environment Canada's Data Management Service
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management Service
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing Paradigms
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
wdb-01.ppt
wdb-01.pptwdb-01.ppt
wdb-01.ppt
 
Data Mining Presentation on Science Day 2023
Data Mining Presentation on Science Day 2023Data Mining Presentation on Science Day 2023
Data Mining Presentation on Science Day 2023
 
Data mining
Data miningData mining
Data mining
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Preservation Metadata, Michael Day, DCC
Preservation Metadata, Michael Day, DCCPreservation Metadata, Michael Day, DCC
Preservation Metadata, Michael Day, DCC
 

More from PERICLES_FP7

Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17PERICLES_FP7
 
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...PERICLES_FP7
 
Technical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshopTechnical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshopPERICLES_FP7
 
ForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information ModelForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information ModelPERICLES_FP7
 
Data quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspectiveData quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspectivePERICLES_FP7
 
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...PERICLES_FP7
 
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016PERICLES_FP7
 
Filling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangeFilling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangePERICLES_FP7
 
Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...PERICLES_FP7
 
Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016PERICLES_FP7
 
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...PERICLES_FP7
 
Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...PERICLES_FP7
 
Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016PERICLES_FP7
 
Reproducibile scientific workflows - Acting on Change 2016
Reproducibile scientific workflows - Acting on Change 2016Reproducibile scientific workflows - Acting on Change 2016
Reproducibile scientific workflows - Acting on Change 2016PERICLES_FP7
 
Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...PERICLES_FP7
 
PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...PERICLES_FP7
 
PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016PERICLES_FP7
 
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016PERICLES_FP7
 
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES_FP7
 
PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...
PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...
PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...PERICLES_FP7
 

More from PERICLES_FP7 (20)

Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17Digital Ecosystem and Process Compiler - IDCC17
Digital Ecosystem and Process Compiler - IDCC17
 
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
Technical Appraisal of Complex Digital Objects in Evolving Environments - IDC...
 
Technical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshopTechnical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshop
 
ForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information ModelForgetIT: human memory inspired Information Model
ForgetIT: human memory inspired Information Model
 
Data quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspectiveData quality, preservation and access: a DANS perspective
Data quality, preservation and access: a DANS perspective
 
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
Proactive Evolution management in Data-centric SW ecosystems - Acting on Chan...
 
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
 
Filling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangeFilling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on Change
 
Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...Risk assessment for preservation in the active life of complex digital object...
Risk assessment for preservation in the active life of complex digital object...
 
Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016Technical Appraisal Tool, MICE - Acting on Change 2016
Technical Appraisal Tool, MICE - Acting on Change 2016
 
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
PERICLES Workflow for the automated updating of Digital Ecosystem Models with...
 
Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...Capability gap - Preservation isn't just throwing tools at the problem - Acti...
Capability gap - Preservation isn't just throwing tools at the problem - Acti...
 
Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016Automatic policy application and change management - Acting on Change 2016
Automatic policy application and change management - Acting on Change 2016
 
Reproducibile scientific workflows - Acting on Change 2016
Reproducibile scientific workflows - Acting on Change 2016Reproducibile scientific workflows - Acting on Change 2016
Reproducibile scientific workflows - Acting on Change 2016
 
Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...Pro-active solutions for higher reproducibility of scientific experiments - A...
Pro-active solutions for higher reproducibility of scientific experiments - A...
 
PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...PERICLES Policy management & ontology supported preservation - Acting on Chan...
PERICLES Policy management & ontology supported preservation - Acting on Chan...
 
PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016PERICLES Modelling Policies - Acting on Change 2016
PERICLES Modelling Policies - Acting on Change 2016
 
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
PERICLES Ecosystem Modelling (NCDD use case) - Acting on Change 2016
 
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
 
PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...
PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...
PERICLES Building Digital Ecosystem Models - ‘Eye of the Storm: Preserving Di...
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 

Digital Preservation in the era of Big Data - The Diachron Platform - Acting on Change 2016

  • 1. Digital Preservation in the era of Big Data The DIACHRON Platform, archiving and querying linked open data George Papastefanatos gpapas@imis.athena-innovation.gr “Athena” Research & Innovation Center Panel discussion: Preparing for change Acting on Change Conference : New Approaches and Future Practices in LTDP London, Dec 2016
  • 2. Data Explosion BIG DATA GENERATE SIGNIFICANT FINANCIAL VALUE ACROSS SECTORS.
  • 3. Data on the Web  Global data space  Connecting data from diverse domains and sources  Primary objects: (description of) “entities”  Links between “entities”  Info granularity: from entire data collections to atomic data Interrelated, Heterogeneous Adapted from Chris Bizer, Richard Cyganiak, Tom Heath, available at http://linkeddata.org/guides-and-tutorials Web of Data Conceptual Representation entity entity entity entity Typed Links Typed Links Typed Links Spreadsheets HTMLXMLRDFa URIs URIs URIs URIs URIs URIs represent SemiStructuredTriplesStatistical represent represent represent Web of world things described by Web of data
  • 4. Data Web Evolution  Explosion of data volume published on web and diversity of sources  Government  Scientific  Corporate  Crowd-sourced  Linked Open Data (LOD) continuously published  Currently data.gouv.fr lists 350,000 datasets, data.gov.uk has 8,200 datasets. Current Status 2007 2009 2011
  • 5. Statistics Datasets#: 1014 Social web 51.28% Government 18.05% Publications 9.47% Life sciences 8.19% User-generated content 4.73% Cross-domain 4.04% Media 2.17% Geographic 2.07% Rapidly Evolving Ecosystem Mid 2014 http://lod-cloud.net/
  • 6. Big Data Preservation is Challenging Emerging Application Domains 2020: digital data production > 40 zetabytes = 5,200 Gbytes for every person on the planet WIRED – 09/10/2014 Effective & efficient techniques to manage the data lifecycle Appraisal Integration ArchivingProducing Publishing Linking
  • 7. DIACHRON Approach Publishing and preservation of data performed together Archiving and dissemination are synonymous.
  • 9. Dataset Model [M1-M12] – Task 1.5Diachronic Dataset D1 D1(t1) D1(t2) D1(t3)…………. D1(tm) t1 t2 t3 t4 …………. time Change Set D1(t1,t2) Change Set D1(t1,t2) Time-Agnostic Space Time-Aware Space Record_1 Record Atts subject predicate “6 Artemidos st.” Resource_a (D1,tm) “vcard:has Address” object RecordSet (tm) Schema(tm) Data Space Curated Information Space Record_2 Record Atts predicate “John Doe” foaf:na me object subject D1(tn) Resource_a (D1,tn) Record_i Record Atts subject Resource Changes (D1,tm,tn) Change Set D1(t1,t2) Resource_a (D2,tk) …………. Record and Schema changes Diachronic Resource b owl:sameAs Diachronic Resource a Diachronic Dataset D2
  • 10. Dataset Model [M1-M12] – Task 1.5Diachronic Dataset D1 D1(t1) D1(t2) D1(t3)…………. D1(tm) t1 t2 t3 t4 …………. time Change Set D1(t1,t2) Change Set D1(t1,t2) Time-Agnostic Space Time-Aware Space Record_1 Record Atts subject predicate “6 Artemidos st.” Resource_a (D1,tm) “vcard:has Address” object RecordSet (tm) Schema(tm) Data Space Curated Information Space Record_2 Record Atts predicate “John Doe” foaf:na me object subject D1(tn) Resource_a (D1,tn) Record_i Record Atts subject Resource Changes (D1,tm,tn) Change Set D1(t1,t2) Resource_a (D2,tk) …………. Record and Schema changes Diachronic Resource b owl:sameAs Diachronic Resource a Diachronic Dataset D2
  • 11. Dataset Model [M1-M12] – Task 1.5Diachronic Dataset D1 D1(t1) D1(t2) D1(t3)…………. D1(tm) t1 t2 t3 t4 …………. time Change Set D1(t1,t2) Change Set D1(t1,t2) Time-Agnostic Space Time-Aware Space Record_1 Record Atts subject predicate “6 Artemidos st.” Resource_a (D1,tm) “vcard:has Address” object RecordSet (tm) Schema(tm) Data Space Curated Information Space Record_2 Record Atts predicate “John Doe” foaf:na me object subject D1(tn) Resource_a (D1,tn) Record_i Record Atts subject Resource Changes (D1,tm,tn) Change Set D1(t1,t2) Resource_a (D2,tk) …………. Record and Schema changes Diachronic Resource b owl:sameAs Diachronic Resource a Diachronic Dataset D2
  • 12. Dataset Model [M1-M12] – Task 1.5Diachronic Dataset D1 D1(t1) D1(t2) D1(t3)…………. D1(tm) t1 t2 t3 t4 …………. time Change Set D1(t1,t2) Change Set D1(t1,t2) Time-Agnostic Space Time-Aware Space Record_1 Record Atts subject predicate “6 Artemidos st.” Resource_a (D1,tm) “vcard:has Address” object RecordSet (tm) Schema(tm) Data Space Curated Information Space Record_2 Record Atts predicate “John Doe” foaf:na me object subject D1(tn) Resource_a (D1,tn) Record_i Record Atts subject Resource Changes (D1,tm,tn) Change Set D1(t1,t2) Resource_a (D2,tk) …………. Record and Schema changes Diachronic Resource b owl:sameAs Diachronic Resource a Diachronic Dataset D2
  • 13. Dataset Model [M1-M12] – Task 1.5Diachronic Dataset D1 D1(t1) D1(t2) D1(t3)…………. D1(tm) t1 t2 t3 t4 …………. time Change Set D1(t1,t2) Change Set D1(t1,t2) Time-Agnostic Space Time-Aware Space Record_1 Record Atts subject predicate “6 Artemidos st.” Resource_a (D1,tm) “vcard:has Address” object RecordSet (tm) Schema(tm) Data Space Curated Information Space Record_2 Record Atts predicate “John Doe” foaf:na me object subject D1(tn) Resource_a (D1,tn) Record_i Record Atts subject Resource Changes (D1,tm,tn) Change Set D1(t1,t2) Resource_a (D2,tk) …………. Record and Schema changes Diachronic Resource b owl:sameAs Diachronic Resource a Diachronic Dataset D2
  • 14. Dataset Model [M1-M12] – Task 1.5Diachronic Dataset D1 D1(t1) D1(t2) D1(t3)…………. D1(tm) t1 t2 t3 t4 …………. time Change Set D1(t1,t2) Change Set D1(t1,t2) Time-Agnostic Space Time-Aware Space Record_1 Record Atts subject predicate “6 Artemidos st.” Resource_a (D1,tm) “vcard:has Address” object RecordSet (tm) Schema(tm) Data Space Curated Information Space Record_2 Record Atts predicate “John Doe” foaf:na me object subject D1(tn) Resource_a (D1,tn) Record_i Record Atts subject Resource Changes (D1,tm,tn) Change Set D1(t1,t2) Resource_a (D2,tk) …………. Record and Schema changes Diachronic Resource b owl:sameAs Diachronic Resource a Diachronic Dataset D2
  • 15. DIACHRON Query Language • Queries on archive catalog • Lists of datasets • Lists of versions of a given dataset • Filtered based on temporal, provenance or other metadata criteria • Queries on Data • Retrieve part(s) of a dataset that match certain criteria. • Longitudinal queries • Retrieve part(s) of a dataset across multiple versions. • Temporal (version based) criteria can be applied. • Queries on Changes • Retrieve changes between two concurrent versions. • Limit results for specific type of changes (schema, data, etc.). • Mixed Queries on Changes and Data • Retrieve datasets or parts of datasets affected by specific changes Requirements
  • 16. Diachron Query language • Extension of SPARQL – SPARQL queries are valid DIACHRON queries • DIACHRON graph model – basis of the query language, e.g. – <FROM DATASET>,<FROM CHANGES>, … • Specific versions – AT VERSION, AFTER VERSION, BEFORE VERSION, BETWEEN VERSIONS • Syntactic Sugar for graph patterns, e.g. – RECORD (e.g. for record variable) – RECATT • Query results dereified Overview
  • 17. Archiving Strategies • Versions Materialization (query efficiency, space consuming) • Changes (delta-based) Materialization (space efficiency, poor query performance, update overhead) • Versions & Changes Materialization (vast space requirements update overhead) 1st approach
  • 18. Archiving Strategies • Hybrid Materialization • Only major versions & and all changes (delta) are stored • Balance between query performance & storage space 2nd approach
  • 21. DIACHRON • http://wwwdev.ebi.ac.uk/ols/beta/ontologies/go • http://diachron.imis.athena-innovation.gr:8080/services/ui/ • https://www.youtube.com/channel/UCIzfRLHiuOz4ZgaSytAgP7 w • https://twitter.com/diachron_fp7 @diachron_fp7 • https://github.com/diachron Demos & Outreach

Editor's Notes

  1. Web data is only one source of big data. Big data, in general; its use and exploitation has come to generate new values not only within the ICT sector, but in very varying financial sectors, ranging from ….(mention here above sectors)
  2. Data published on the web is one of the main source of information. Following the Linked data paradigm, data coming from many diverse domains are published on the web. Primary objects are not html documents, but rather entities, uniquely identified by a URI and connected with typed links between them. This forms a global interconnected dataspace, that is independent of the data domain, the data formats, the granularity of the data (entire data collections vs atomic data)
  3. In this context, the need for big data preservation and archiving is far from challenging, Archiving associated with queries, linking, … almost as good as the most recent version.
  4. At the core, there us a Unified DIACHRONIC model for incorporating various data models and their evolution. It is structured across two dimensions the time and the information dimension. We distinguish between time-aware and time-agnostic objects. Time aware are objects incorporate evolution (changes) and temporal information , whereas time-agnostic objects represent unchangeable – diacronic objects. At the information space, we have the data space where we capture datasets, and the curated space where resources capture semantic rich notions within datasets.
  5. DIACHRON is a pilot-motivated project. Its primary focus is to deliver services tailored to real preservation needs of big data providers. Our three cases concern, an open-data use case, dealing with governmental and statistical multidimensional data. An enterprise use case, dealing with close-world enterprise data and a scientific use case dealing with biological data.