SlideShare a Scribd company logo
Experiments with
evolving RDF
Sławek Staworko
(joint work with Peter Buneman)
University of Edinburgh
Preservation of evolving data
Tom
cat
has
tuna
eats
Tom
cat
has
Apr 1
dies
Tom
dog
has
dog
food eats
Version 1 Version 2 Version 3
…
Archive
• Version retrieval
• Timeline queries
• Storage space efficiency
Approaches to data
preservation
• Store all versions
• Store the original databases and log the changes
• Hybrid approach of the above two
• store the initial and every 10th version
• store log changes for the intermediate versions
• Annotation based approach!
• never delete data but annotate its validity with
time intervals
Annotation of RDF
Tom
cat
has
tuna
eats
Tom
cat
has
Apr 1
dies
Tom
dog
has
dog
food eats
Version 1 Version 2 Version 3
Archive
Tom
cat
has [1–2]
tuna
eats [1–1]
Apr 1
dies [2–2]
dog
has [3—]
dog
food
eats [3—]
What exactly is the input?
Delta = difference between two databases expressed with
two atomic operations: inserting a triple and deleting a triple
Tom
cat
has
tuna
eats
Tom
cat
has
Apr 1
dies
Tom
dog
has
dog
food eats
delete (cat, eats, tuna)
insert (cat, dies, Apr 1)
delete (Tom, has, cat)
insert (Tom, has, dog)
inset (dog, eats, dog food)
delete (cat, dies, Apr 1)
Snapshots
Deltas
Snapshots = complete database instances
Challenges in preserving
evolving data with annotations
1. The task is relatively simple if deltas are know:!
• deleting a triple closes its interval!
• adding a triple opens a new interval !
2. It gets complicated when only snapshots are given!
• it boils down to computing deltas!
• main challenge: identify objects that are the same across
versions of the database
Entity resolution problem!
which data object represent the same entity across different versions!
well-studied database problem in various different settings
(from duplicate elimination to record matching)
Entity resolution and RDF
URI (Uniform resource identifier)
URIs are supposed to make things easy but…
• RDF has also blank nodes
• URIs don’t exactly solve the problem in the
context of evolving/merged ontologies…
Two different RDF nodes need not represent different objects
Blank nodes
• LOD initiative frowns upon them
• Blank nodes are commonplace (and misused?)
Tom
cat
has
Peter
believes
Tom cathas
Peter believes
_bsubject
pred
object
_b
2.4 -0.4
Reification Complex number
Blank nodes (cont.)
1. Reification (Peter believes that Tom has a cat)
2. Data structures (complex types)
3. Anonymization (Tom has a pet)
Assumptions on reasonable use of blank nodes:!
1. Represent concrete objects !
2. The objects can be identified from the context
Deblanking
_b1
7 end
_b2
3
_b3
5
LISP-style encoding
list of numbers [5,3,7]
head
head
head
tail
tail
tail
#(7,end)
7 end
_b2
3
_b3
5
head
head
head
tail
tail
tail
#(7,end)
7 end
#(3,7,end)
3
_b3
5
head
head
head
tail
tail
tail
#(7,end)
7 end
#(3,7,end)
3
#(5,3,7,end)
5
head
head
head
tail
tail
tail
Assumption: graph has no cycles consisting of blanks only
Assumption: identity of a blank node is determined by its contents
Experiements
• 10 versions of Experimental Factor Ontology (EFO)
data expressed in OWL
• 200k triples in the 1st version, 290k in the last
• On average 20k blank nodes in each version
• 920k triples overall (blank nodes are independent)
• many triples do not last more than 1 version
Experiment
Deblanking and life expectancy of an object
Round Triples Blanks Life expect.
0 921896 165935 2.55
1 358857 33253 6.39
2 348356 28150 6.57
3 339695 23502 6.88
4 330564 18862 7.10
5 318761 14763 7.24
6 311562 11021 7.39
7 304628 7299 7.54
8 297744 3622 7.83
9 285484 58 7.83
10 285334 2 7.83
11 285334 1 7.83
12 285334 0 7.83
Improving space efficiency
Peter
Edinburgh +44 712 4567
phone [1–10]lives [1–10]
Peter
Edinburgh +44 712 4567
phonelives
[1–10]Lift common intervals to subject
dog
has [1–5]
dog
has [1–5]
• Intervals moved from all but 33.7k triples (of total 285k)
• Number of subjects with histories is 34.3k
• Total number of intervals is reduced from 285k to 60k
• The size of the index reduced by almost 80%
Future:
• Bisimulation
• Nested RDF
Conclusions
• Annotation offers an attractive way of representing
an evolving RDF dataset (need for nested RDF?)
• Evolution of data may require more complex atomic
operations. For instance, vocabulary evolution:
adding, splitting, merging classes. (can
bisimulation help here?)

More Related Content

Similar to Experiments with evolving RDF

Getting started in Python presentation by Laban K
Getting started in Python presentation by Laban KGetting started in Python presentation by Laban K
Getting started in Python presentation by Laban K
GDSCKYAMBOGO
 
Kavitha_python.ppt
Kavitha_python.pptKavitha_python.ppt
Kavitha_python.ppt
KavithaMuralidharan2
 
Python 101 1
Python 101   1Python 101   1
Python 101 1
Iccha Sethi
 
Programming with Python
Programming with PythonProgramming with Python
Programming with Python
Rasan Samarasinghe
 
DConf 2016: Bitpacking Like a Madman by Amaury Sechet
DConf 2016: Bitpacking Like a Madman by Amaury SechetDConf 2016: Bitpacking Like a Madman by Amaury Sechet
DConf 2016: Bitpacking Like a Madman by Amaury Sechet
Andrei Alexandrescu
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
ALOK52916
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
VishwasKumar58
 
Python Basics
Python BasicsPython Basics
Python Basics
MobeenAhmed25
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
RajPurohit33
 
Lenguaje Python
Lenguaje PythonLenguaje Python
Lenguaje Python
RalAnteloJurado
 
Learn Python in Three Hours - Presentation
Learn Python in Three Hours - PresentationLearn Python in Three Hours - Presentation
Learn Python in Three Hours - Presentation
Naseer-ul-Hassan Rehman
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
RedenOriola
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
AshokRachapalli1
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
JemuelPinongcos1
 
pysdasdasdsadsadsadsadsadsadasdasdthon1.ppt
pysdasdasdsadsadsadsadsadsadasdasdthon1.pptpysdasdasdsadsadsadsadsadsadasdasdthon1.ppt
pysdasdasdsadsadsadsadsadsadasdasdthon1.ppt
kashifmajeedjanjua
 
coolstuff.ppt
coolstuff.pptcoolstuff.ppt
coolstuff.ppt
GeorgePama1
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
SATHYANARAYANAKB
 
Introductio_to_python_progamming_ppt.ppt
Introductio_to_python_progamming_ppt.pptIntroductio_to_python_progamming_ppt.ppt
Introductio_to_python_progamming_ppt.ppt
HiralPatel798996
 
ENGLISH PYTHON.ppt
ENGLISH PYTHON.pptENGLISH PYTHON.ppt
ENGLISH PYTHON.ppt
GlobalTransLogistics
 
1. python programming
1. python programming1. python programming
1. python programming
sreeLekha51
 

Similar to Experiments with evolving RDF (20)

Getting started in Python presentation by Laban K
Getting started in Python presentation by Laban KGetting started in Python presentation by Laban K
Getting started in Python presentation by Laban K
 
Kavitha_python.ppt
Kavitha_python.pptKavitha_python.ppt
Kavitha_python.ppt
 
Python 101 1
Python 101   1Python 101   1
Python 101 1
 
Programming with Python
Programming with PythonProgramming with Python
Programming with Python
 
DConf 2016: Bitpacking Like a Madman by Amaury Sechet
DConf 2016: Bitpacking Like a Madman by Amaury SechetDConf 2016: Bitpacking Like a Madman by Amaury Sechet
DConf 2016: Bitpacking Like a Madman by Amaury Sechet
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
 
Python Basics
Python BasicsPython Basics
Python Basics
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
 
Lenguaje Python
Lenguaje PythonLenguaje Python
Lenguaje Python
 
Learn Python in Three Hours - Presentation
Learn Python in Three Hours - PresentationLearn Python in Three Hours - Presentation
Learn Python in Three Hours - Presentation
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
 
pysdasdasdsadsadsadsadsadsadasdasdthon1.ppt
pysdasdasdsadsadsadsadsadsadasdasdthon1.pptpysdasdasdsadsadsadsadsadsadasdasdthon1.ppt
pysdasdasdsadsadsadsadsadsadasdasdthon1.ppt
 
coolstuff.ppt
coolstuff.pptcoolstuff.ppt
coolstuff.ppt
 
python1.ppt
python1.pptpython1.ppt
python1.ppt
 
Introductio_to_python_progamming_ppt.ppt
Introductio_to_python_progamming_ppt.pptIntroductio_to_python_progamming_ppt.ppt
Introductio_to_python_progamming_ppt.ppt
 
ENGLISH PYTHON.ppt
ENGLISH PYTHON.pptENGLISH PYTHON.ppt
ENGLISH PYTHON.ppt
 
1. python programming
1. python programming1. python programming
1. python programming
 

More from PRELIDA Project

Steps towards a Data Value Chain
Steps towards a Data Value ChainSteps towards a Data Value Chain
Steps towards a Data Value Chain
PRELIDA Project
 
Preserving linked data: sustainability and organizational infrastructure
Preserving linked data: sustainability and organizational infrastructurePreserving linked data: sustainability and organizational infrastructure
Preserving linked data: sustainability and organizational infrastructure
PRELIDA Project
 
Organizational and Economic Issues in Linked Data Preservation
Organizational and Economic Issues in Linked Data PreservationOrganizational and Economic Issues in Linked Data Preservation
Organizational and Economic Issues in Linked Data Preservation
PRELIDA Project
 
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
PRELIDA Project
 
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
PRELIDA Project
 
Media Ecology Project
Media Ecology ProjectMedia Ecology Project
Media Ecology Project
PRELIDA Project
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
PRELIDA Project
 
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataCEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
PRELIDA Project
 
DIACHRON Preservation: Evolution Management for Preservation
DIACHRON Preservation: Evolution Management for PreservationDIACHRON Preservation: Evolution Management for Preservation
DIACHRON Preservation: Evolution Management for Preservation
PRELIDA Project
 
DIACHRON Project Overview
DIACHRON Project OverviewDIACHRON Project Overview
DIACHRON Project Overview
PRELIDA Project
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft Roadmap
PRELIDA Project
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital Preservation
PRELIDA Project
 
Introduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination WorkshopIntroduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination Workshop
PRELIDA Project
 
D3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital PreservationD3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital Preservation
PRELIDA Project
 
Gap Analysis
Gap AnalysisGap Analysis
Gap Analysis
PRELIDA Project
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA project
PRELIDA Project
 
Introduction to Prelida
Introduction to PrelidaIntroduction to Prelida
Introduction to Prelida
PRELIDA Project
 

More from PRELIDA Project (17)

Steps towards a Data Value Chain
Steps towards a Data Value ChainSteps towards a Data Value Chain
Steps towards a Data Value Chain
 
Preserving linked data: sustainability and organizational infrastructure
Preserving linked data: sustainability and organizational infrastructurePreserving linked data: sustainability and organizational infrastructure
Preserving linked data: sustainability and organizational infrastructure
 
Organizational and Economic Issues in Linked Data Preservation
Organizational and Economic Issues in Linked Data PreservationOrganizational and Economic Issues in Linked Data Preservation
Organizational and Economic Issues in Linked Data Preservation
 
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
 
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
 
Media Ecology Project
Media Ecology ProjectMedia Ecology Project
Media Ecology Project
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
 
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataCEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
 
DIACHRON Preservation: Evolution Management for Preservation
DIACHRON Preservation: Evolution Management for PreservationDIACHRON Preservation: Evolution Management for Preservation
DIACHRON Preservation: Evolution Management for Preservation
 
DIACHRON Project Overview
DIACHRON Project OverviewDIACHRON Project Overview
DIACHRON Project Overview
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft Roadmap
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital Preservation
 
Introduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination WorkshopIntroduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination Workshop
 
D3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital PreservationD3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital Preservation
 
Gap Analysis
Gap AnalysisGap Analysis
Gap Analysis
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA project
 
Introduction to Prelida
Introduction to PrelidaIntroduction to Prelida
Introduction to Prelida
 

Recently uploaded

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 

Recently uploaded (20)

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 

Experiments with evolving RDF

  • 1. Experiments with evolving RDF Sławek Staworko (joint work with Peter Buneman) University of Edinburgh
  • 2. Preservation of evolving data Tom cat has tuna eats Tom cat has Apr 1 dies Tom dog has dog food eats Version 1 Version 2 Version 3 … Archive • Version retrieval • Timeline queries • Storage space efficiency
  • 3. Approaches to data preservation • Store all versions • Store the original databases and log the changes • Hybrid approach of the above two • store the initial and every 10th version • store log changes for the intermediate versions • Annotation based approach! • never delete data but annotate its validity with time intervals
  • 4. Annotation of RDF Tom cat has tuna eats Tom cat has Apr 1 dies Tom dog has dog food eats Version 1 Version 2 Version 3 Archive Tom cat has [1–2] tuna eats [1–1] Apr 1 dies [2–2] dog has [3—] dog food eats [3—]
  • 5. What exactly is the input? Delta = difference between two databases expressed with two atomic operations: inserting a triple and deleting a triple Tom cat has tuna eats Tom cat has Apr 1 dies Tom dog has dog food eats delete (cat, eats, tuna) insert (cat, dies, Apr 1) delete (Tom, has, cat) insert (Tom, has, dog) inset (dog, eats, dog food) delete (cat, dies, Apr 1) Snapshots Deltas Snapshots = complete database instances
  • 6. Challenges in preserving evolving data with annotations 1. The task is relatively simple if deltas are know:! • deleting a triple closes its interval! • adding a triple opens a new interval ! 2. It gets complicated when only snapshots are given! • it boils down to computing deltas! • main challenge: identify objects that are the same across versions of the database Entity resolution problem! which data object represent the same entity across different versions! well-studied database problem in various different settings (from duplicate elimination to record matching)
  • 7. Entity resolution and RDF URI (Uniform resource identifier) URIs are supposed to make things easy but… • RDF has also blank nodes • URIs don’t exactly solve the problem in the context of evolving/merged ontologies… Two different RDF nodes need not represent different objects
  • 8. Blank nodes • LOD initiative frowns upon them • Blank nodes are commonplace (and misused?) Tom cat has Peter believes Tom cathas Peter believes _bsubject pred object _b 2.4 -0.4 Reification Complex number
  • 9. Blank nodes (cont.) 1. Reification (Peter believes that Tom has a cat) 2. Data structures (complex types) 3. Anonymization (Tom has a pet) Assumptions on reasonable use of blank nodes:! 1. Represent concrete objects ! 2. The objects can be identified from the context
  • 10. Deblanking _b1 7 end _b2 3 _b3 5 LISP-style encoding list of numbers [5,3,7] head head head tail tail tail #(7,end) 7 end _b2 3 _b3 5 head head head tail tail tail #(7,end) 7 end #(3,7,end) 3 _b3 5 head head head tail tail tail #(7,end) 7 end #(3,7,end) 3 #(5,3,7,end) 5 head head head tail tail tail Assumption: graph has no cycles consisting of blanks only Assumption: identity of a blank node is determined by its contents
  • 11. Experiements • 10 versions of Experimental Factor Ontology (EFO) data expressed in OWL • 200k triples in the 1st version, 290k in the last • On average 20k blank nodes in each version • 920k triples overall (blank nodes are independent) • many triples do not last more than 1 version
  • 12. Experiment Deblanking and life expectancy of an object Round Triples Blanks Life expect. 0 921896 165935 2.55 1 358857 33253 6.39 2 348356 28150 6.57 3 339695 23502 6.88 4 330564 18862 7.10 5 318761 14763 7.24 6 311562 11021 7.39 7 304628 7299 7.54 8 297744 3622 7.83 9 285484 58 7.83 10 285334 2 7.83 11 285334 1 7.83 12 285334 0 7.83
  • 13. Improving space efficiency Peter Edinburgh +44 712 4567 phone [1–10]lives [1–10] Peter Edinburgh +44 712 4567 phonelives [1–10]Lift common intervals to subject dog has [1–5] dog has [1–5] • Intervals moved from all but 33.7k triples (of total 285k) • Number of subjects with histories is 34.3k • Total number of intervals is reduced from 285k to 60k • The size of the index reduced by almost 80%
  • 15. Conclusions • Annotation offers an attractive way of representing an evolving RDF dataset (need for nested RDF?) • Evolution of data may require more complex atomic operations. For instance, vocabulary evolution: adding, splitting, merging classes. (can bisimulation help here?)