SlideShare a Scribd company logo
1 of 42
Download to read offline
A sweet affordable combo for
Linked Data Archives
Miel Vander Sande

Children are sad because they didn’t
get the information they needed.
Sinterklaas got a burn-out.
Also Linked Data Archives have

this sustainability problem.
Many data publishing institutions are 

under-resourced.

Many of them care about data history.
Looking for “good-enough” solutions
Commonly resort to data dumps
Not able to afford SPARQL infrastructure
Also Linked Data Archives have

this sustainability problem.
Many clients asking complex queries 

is very expensive for a server to scale.
Access to data history makes this 

problem harder.
Unavailable servers prevent applications 

to unlock potential.
Pragmatic archiving with HDT
Sustainable querying with 

Triple Pattern Fragments
Uniform access to history with Memento
A sweet affordable combo for 

Linked Data Archives
Time travelling through DBpedia
Pragmatic archiving with HDT
Sustainable querying with 

Triple Pattern Fragments
Uniform access to history with Memento
A sweet affordable combo for 

Linked Data Archives
Time travelling through DBpedia
Single archive file (*.hdt)
Header-Dictionary-Triples (HDT) is a
compact binary RDF representation.
Header
Dictionary Triples
Created by Fernández, Javier et.al (He should be in this room…)
Features of HDT are desired properties
for digital archives.
Represent massive data sets as a single file
Rapid search for ?subject ?predicate ?object
Included header with dataset metadata
High volumes
Direct access
Discovery and exchange
HDT At0
HDT At-1
HDT At-2
HDT At-3
HDT At-4
HDT Bt0
HDT Bt-1
HDT Bt-2
HDT Zt0
HDT Zt-1
HDT Zt-2
HDT Zt-3
HDT Zt-4
…t0
Dataset A Dataset B Dataset Z
t-1
t-2
t-3
t-4
A matrix of HDT files can serve as

pragmatic RDF archive.
Time-based index
14 DBpedia versions take 12.75% 

of the original N-triples size.
0
40
80
120
160
2.0
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
2014
2015-04
2015-10
Original size in NT (GB) HDT size (GB)
Space and time-to-publish significantly
decreased for DBpedia.
Original HDT -based
Indexing Custom HDT-CPP
Indexing time ~ 24 hours per version ~ 4 hours per version
Storage MongoDB HDT binary files
Space 383 Gb 178 Gb
# Versions
10 versions: 

2.0 through 3.9
14 versions: 

2.0 through 2015-10
# Triples ~ 3 billion ~ 6 billion
Pragmatic archiving with HDT
Sustainable querying with 

Triple Pattern Fragments
Uniform access to history with Memento
A sweet affordable combo for 

Linked Data Archives
Time travelling through DBpedia
Linked Data Fragments: hunting 

trade-offs between client & server.
high server costlow server cost
data

dump
SPARQL

endpoint
interface offered by the server
high availability low availability
high bandwidth low bandwidth
out-of-date data live data
low client costhigh client cost
Linked Data

pages
low server cost
data

dump
SPARQL

query results
high availability
live data
Linked Data

pages
triple pattern

fragments
A triple pattern fragments interface

is low-cost and enables clients to query.
A Triple Pattern Fragments interface

acts as a gateway to an RDF source.
Client can only ask ?s ?p ?o patterns.
Decompose complex SPARQL queries

on the client-side.
Low server cost, highly cacheable,

but higher bandwidth and query time.
Usage of fragments.dbpedia.org is
steadily increasing.
#Requests
February 2015 September 2016
19.239.907
4.500.000
And still the API has 99.99% 

availability up to today.
Pragmatic archiving with HDT
Sustainable querying with 

Triple Pattern Fragments
Uniform access to history with Memento
A sweet affordable combo for 

Linked Data Archives
Time travelling through DBpedia
The Memento Framework lets you
negotiate Web resources over time.
Any client can transparently 

navigate to a prior version.
Any client can transparently 

navigate to a prior version.
data

dump
SPARQL

endpoint
Linked Data

pages
No memento support

High consumer cost
Memento support

High consumer cost
High publisher cost

Memento support difficult
For archives, interface granularity 

and design are even more important.
Directly compatible with Memento
data

dump
SPARQL

query results
Useful for the consumer (queryable)
Sustainable for publisher
Linked Data

pages
triple pattern

fragments
The Triple Pattern Fragments trade-off

also pays off for archives.
Different HDT snapshots are exposed
through an LDF server with Memento
http://fragments.dbpedia.org
DBpedia pages can be made available
through a proxy.
http://dbpedia.org/resource/…
Preparing the TPF client is simply 

adding an HTTP header.
Query Engine

SPARQL Processing
Hypermedia Layer

Fragments interaction
HTTP Layer

Resource access
Dataset B Dataset A
303 Location
200 Content-Location (CORS)
Client
Server
GET Accept-Datetime
A self-descriptive interface results 

in a single datetime negotiation.
Query Engine

SPARQL Processing
Hypermedia Layer

Fragments interaction
HTTP Layer

Resource access
Dataset B Dataset A
Client
Server
GET200
Pragmatic archiving with HDT
Sustainable querying with 

Triple Pattern Fragments
Uniform access to history with Memento
A sweet affordable combo for 

Linked Data Archives
Time travelling through DBpedia
There is a huge amount of interesting
information in the history of 

Linked Data.
What could we learn if we could
easily query it?
Querying history and the evolution
of facts.
When did a researcher with name 

Frederik H. Kreuger and 

born in Amsterdam die?
Try it yourself:

bit.ly/frederikkreuger

bit.ly/frederikkreuger-2013
What predicates were added in DBpedia 

between 2009 and 2014 to describe 

a person?
Analyze and profile changes 

in a data.
Try it yourself:
bit.ly/personpredicates-2009
bit.ly/personpredicates-2014
What works by cubists were known by 

DBpedia and VIAF in 2009?
Resolve out-of-sync issues between
federated sources.
Try it yourself:

bit.ly/workscubists-2009
bit.ly/workscubists
Start hosting your own Linked Data 

archive (or play with the DBpedia one)!
github.com/LinkedDataFragments

bit.ly/configuring-memento
www.rdfhdt.org
linkeddatafragments.org

mementoweb.org
Software
Documentation and specification
fragments.mementodepot.org
Query the DBpedia archive on
A sweet affordable combo for
Linked Data Archives
@Miel_vds

Herbert Van de Sompel

Harihar Shankar 

Lyudmila Balakireva

Ruben Verborgh

More Related Content

What's hot

Building an editable, versionized LOD service for library data
Building an editable, versionized LOD service for library dataBuilding an editable, versionized LOD service for library data
Building an editable, versionized LOD service for library data
Felix Ostrowski
 

What's hot (15)

MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
 
Configuring Greenstone's OAI server
Configuring Greenstone's OAI serverConfiguring Greenstone's OAI server
Configuring Greenstone's OAI server
 
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
 
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data SourcesVirtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
 
Building an editable, versionized LOD service for library data
Building an editable, versionized LOD service for library dataBuilding an editable, versionized LOD service for library data
Building an editable, versionized LOD service for library data
 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Incorporating Functions in Mappings to Facilitate the Uplift of CSV Files int...
Incorporating Functions in Mappings to Facilitate the Uplift of CSV Files int...Incorporating Functions in Mappings to Facilitate the Uplift of CSV Files int...
Incorporating Functions in Mappings to Facilitate the Uplift of CSV Files int...
 
Open Location Data and Linked Open Data
Open Location Data and Linked Open DataOpen Location Data and Linked Open Data
Open Location Data and Linked Open Data
 
Sap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory databaseSap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory database
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
IPv6
IPv6IPv6
IPv6
 
Linking Open Data
Linking Open DataLinking Open Data
Linking Open Data
 
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OOVirtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
 

Viewers also liked

Viewers also liked (9)

Innovatiemarkt 2017: Machines are the new digital natives
Innovatiemarkt 2017: Machines are the new digital nativesInnovatiemarkt 2017: Machines are the new digital natives
Innovatiemarkt 2017: Machines are the new digital natives
 
Computing Emotions
Computing EmotionsComputing Emotions
Computing Emotions
 
SFI Speakers for Schools: Robots Are Invading Our Lives
SFI Speakers for Schools: Robots Are Invading Our LivesSFI Speakers for Schools: Robots Are Invading Our Lives
SFI Speakers for Schools: Robots Are Invading Our Lives
 
Laws of Robotics
Laws of RoboticsLaws of Robotics
Laws of Robotics
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Agile Data Science 2.0 - Big Data Science Meetup
Agile Data Science 2.0 - Big Data Science MeetupAgile Data Science 2.0 - Big Data Science Meetup
Agile Data Science 2.0 - Big Data Science Meetup
 
Deploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in HealthcareDeploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in Healthcare
 
All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive An...
All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive An...All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive An...
All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive An...
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Similar to A sweet affordable combo for Linked Data Archives

Similar to A sweet affordable combo for Linked Data Archives (20)

Querying datasets on the Web with high availability
Querying datasets on the Web with high availabilityQuerying datasets on the Web with high availability
Querying datasets on the Web with high availability
 
Initial Usage Analysis of DBpedia's Triple Pattern Fragments
Initial Usage Analysis of DBpedia's Triple Pattern FragmentsInitial Usage Analysis of DBpedia's Triple Pattern Fragments
Initial Usage Analysis of DBpedia's Triple Pattern Fragments
 
Time series database by Harshil Ambagade
Time series database by Harshil AmbagadeTime series database by Harshil Ambagade
Time series database by Harshil Ambagade
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
HDF Data in the Cloud
HDF Data in the CloudHDF Data in the Cloud
HDF Data in the Cloud
 
Linked data and voyager
Linked data and voyagerLinked data and voyager
Linked data and voyager
 
Sustainable queryable access to Linked Data
Sustainable queryable access to Linked DataSustainable queryable access to Linked Data
Sustainable queryable access to Linked Data
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...
 
On a web of data streams
On a web of data streamsOn a web of data streams
On a web of data streams
 
Versioned Triple Pattern Fragments
Versioned Triple Pattern FragmentsVersioned Triple Pattern Fragments
Versioned Triple Pattern Fragments
 
Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational Databases
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
ECSA 2013 (Cuesta)
ECSA 2013 (Cuesta)ECSA 2013 (Cuesta)
ECSA 2013 (Cuesta)
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hive
 

More from Miel Vander Sande

PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
Miel Vander Sande
 
Opportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership MetadataOpportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership Metadata
Miel Vander Sande
 
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Miel Vander Sande
 
LDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triplesLDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triples
Miel Vander Sande
 
The Terminator's origins or how the Semantic Web could endanger Humanity.
The Terminator's origins or how the Semantic Web could endanger Humanity.The Terminator's origins or how the Semantic Web could endanger Humanity.
The Terminator's origins or how the Semantic Web could endanger Humanity.
Miel Vander Sande
 
PMOD Challenges for Open Data Usage: Open derivatives and challenges
PMOD Challenges for Open Data Usage: Open derivatives and challengesPMOD Challenges for Open Data Usage: Open derivatives and challenges
PMOD Challenges for Open Data Usage: Open derivatives and challenges
Miel Vander Sande
 
Aan de slag met Linked Open Data
Aan de slag met Linked Open DataAan de slag met Linked Open Data
Aan de slag met Linked Open Data
Miel Vander Sande
 
The DataTank: an Open Data adapter with semantic output
The DataTank: an Open Data adapter with semantic outputThe DataTank: an Open Data adapter with semantic output
The DataTank: an Open Data adapter with semantic output
Miel Vander Sande
 

More from Miel Vander Sande (15)

20230525_mmc_seminar.pdf
20230525_mmc_seminar.pdf20230525_mmc_seminar.pdf
20230525_mmc_seminar.pdf
 
The Memento protocol
The Memento protocolThe Memento protocol
The Memento protocol
 
Slight change of plans!
Slight change of plans!Slight change of plans!
Slight change of plans!
 
Preserving a Web of Linked Data: Lessons and challenges from a fading web
Preserving a Web of Linked Data: Lessons and challenges from a fading webPreserving a Web of Linked Data: Lessons and challenges from a fading web
Preserving a Web of Linked Data: Lessons and challenges from a fading web
 
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
 
Machines are the new Digital Natives
Machines are the new Digital NativesMachines are the new Digital Natives
Machines are the new Digital Natives
 
Opportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership MetadataOpportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership Metadata
 
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
 
The Story behind Everything Is Connected: Multimedia narration of automatical...
The Story behind Everything Is Connected: Multimedia narration of automatical...The Story behind Everything Is Connected: Multimedia narration of automatical...
The Story behind Everything Is Connected: Multimedia narration of automatical...
 
LDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triplesLDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triples
 
The Terminator's origins or how the Semantic Web could endanger Humanity.
The Terminator's origins or how the Semantic Web could endanger Humanity.The Terminator's origins or how the Semantic Web could endanger Humanity.
The Terminator's origins or how the Semantic Web could endanger Humanity.
 
PMOD Challenges for Open Data Usage: Open derivatives and challenges
PMOD Challenges for Open Data Usage: Open derivatives and challengesPMOD Challenges for Open Data Usage: Open derivatives and challenges
PMOD Challenges for Open Data Usage: Open derivatives and challenges
 
Aan de slag met Linked Open Data
Aan de slag met Linked Open DataAan de slag met Linked Open Data
Aan de slag met Linked Open Data
 
The DataTank: an Open Data adapter with semantic output
The DataTank: an Open Data adapter with semantic outputThe DataTank: an Open Data adapter with semantic output
The DataTank: an Open Data adapter with semantic output
 
Follow the stars 25/11/2011
Follow the stars 25/11/2011Follow the stars 25/11/2011
Follow the stars 25/11/2011
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

A sweet affordable combo for Linked Data Archives

  • 1. A sweet affordable combo for Linked Data Archives Miel Vander Sande

  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7. Children are sad because they didn’t get the information they needed. Sinterklaas got a burn-out.
  • 8.
  • 9. Also Linked Data Archives have
 this sustainability problem. Many data publishing institutions are 
 under-resourced.
 Many of them care about data history. Looking for “good-enough” solutions Commonly resort to data dumps Not able to afford SPARQL infrastructure
  • 10. Also Linked Data Archives have
 this sustainability problem. Many clients asking complex queries 
 is very expensive for a server to scale. Access to data history makes this 
 problem harder. Unavailable servers prevent applications 
 to unlock potential.
  • 11. Pragmatic archiving with HDT Sustainable querying with 
 Triple Pattern Fragments Uniform access to history with Memento A sweet affordable combo for 
 Linked Data Archives Time travelling through DBpedia
  • 12. Pragmatic archiving with HDT Sustainable querying with 
 Triple Pattern Fragments Uniform access to history with Memento A sweet affordable combo for 
 Linked Data Archives Time travelling through DBpedia
  • 13. Single archive file (*.hdt) Header-Dictionary-Triples (HDT) is a compact binary RDF representation. Header Dictionary Triples Created by Fernández, Javier et.al (He should be in this room…)
  • 14. Features of HDT are desired properties for digital archives. Represent massive data sets as a single file Rapid search for ?subject ?predicate ?object Included header with dataset metadata High volumes Direct access Discovery and exchange
  • 15. HDT At0 HDT At-1 HDT At-2 HDT At-3 HDT At-4 HDT Bt0 HDT Bt-1 HDT Bt-2 HDT Zt0 HDT Zt-1 HDT Zt-2 HDT Zt-3 HDT Zt-4 …t0 Dataset A Dataset B Dataset Z t-1 t-2 t-3 t-4 A matrix of HDT files can serve as
 pragmatic RDF archive. Time-based index
  • 16. 14 DBpedia versions take 12.75% 
 of the original N-triples size. 0 40 80 120 160 2.0 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 2014 2015-04 2015-10 Original size in NT (GB) HDT size (GB)
  • 17. Space and time-to-publish significantly decreased for DBpedia. Original HDT -based Indexing Custom HDT-CPP Indexing time ~ 24 hours per version ~ 4 hours per version Storage MongoDB HDT binary files Space 383 Gb 178 Gb # Versions 10 versions: 
 2.0 through 3.9 14 versions: 
 2.0 through 2015-10 # Triples ~ 3 billion ~ 6 billion
  • 18. Pragmatic archiving with HDT Sustainable querying with 
 Triple Pattern Fragments Uniform access to history with Memento A sweet affordable combo for 
 Linked Data Archives Time travelling through DBpedia
  • 19. Linked Data Fragments: hunting 
 trade-offs between client & server. high server costlow server cost data
 dump SPARQL
 endpoint interface offered by the server high availability low availability high bandwidth low bandwidth out-of-date data live data low client costhigh client cost Linked Data
 pages
  • 20. low server cost data
 dump SPARQL
 query results high availability live data Linked Data
 pages triple pattern
 fragments A triple pattern fragments interface
 is low-cost and enables clients to query.
  • 21. A Triple Pattern Fragments interface
 acts as a gateway to an RDF source. Client can only ask ?s ?p ?o patterns. Decompose complex SPARQL queries
 on the client-side. Low server cost, highly cacheable,
 but higher bandwidth and query time.
  • 22.
  • 23.
  • 24. Usage of fragments.dbpedia.org is steadily increasing. #Requests February 2015 September 2016 19.239.907 4.500.000
  • 25. And still the API has 99.99% 
 availability up to today.
  • 26. Pragmatic archiving with HDT Sustainable querying with 
 Triple Pattern Fragments Uniform access to history with Memento A sweet affordable combo for 
 Linked Data Archives Time travelling through DBpedia
  • 27. The Memento Framework lets you negotiate Web resources over time.
  • 28. Any client can transparently 
 navigate to a prior version.
  • 29. Any client can transparently 
 navigate to a prior version.
  • 30. data
 dump SPARQL
 endpoint Linked Data
 pages No memento support
 High consumer cost Memento support
 High consumer cost High publisher cost
 Memento support difficult For archives, interface granularity 
 and design are even more important.
  • 31. Directly compatible with Memento data
 dump SPARQL
 query results Useful for the consumer (queryable) Sustainable for publisher Linked Data
 pages triple pattern
 fragments The Triple Pattern Fragments trade-off
 also pays off for archives.
  • 32. Different HDT snapshots are exposed through an LDF server with Memento http://fragments.dbpedia.org
  • 33. DBpedia pages can be made available through a proxy. http://dbpedia.org/resource/…
  • 34. Preparing the TPF client is simply 
 adding an HTTP header. Query Engine
 SPARQL Processing Hypermedia Layer
 Fragments interaction HTTP Layer
 Resource access Dataset B Dataset A 303 Location 200 Content-Location (CORS) Client Server GET Accept-Datetime
  • 35. A self-descriptive interface results 
 in a single datetime negotiation. Query Engine
 SPARQL Processing Hypermedia Layer
 Fragments interaction HTTP Layer
 Resource access Dataset B Dataset A Client Server GET200
  • 36. Pragmatic archiving with HDT Sustainable querying with 
 Triple Pattern Fragments Uniform access to history with Memento A sweet affordable combo for 
 Linked Data Archives Time travelling through DBpedia
  • 37. There is a huge amount of interesting information in the history of 
 Linked Data. What could we learn if we could easily query it?
  • 38. Querying history and the evolution of facts. When did a researcher with name 
 Frederik H. Kreuger and 
 born in Amsterdam die? Try it yourself:
 bit.ly/frederikkreuger
 bit.ly/frederikkreuger-2013
  • 39. What predicates were added in DBpedia 
 between 2009 and 2014 to describe 
 a person? Analyze and profile changes 
 in a data. Try it yourself: bit.ly/personpredicates-2009 bit.ly/personpredicates-2014
  • 40. What works by cubists were known by 
 DBpedia and VIAF in 2009? Resolve out-of-sync issues between federated sources. Try it yourself:
 bit.ly/workscubists-2009 bit.ly/workscubists
  • 41. Start hosting your own Linked Data 
 archive (or play with the DBpedia one)! github.com/LinkedDataFragments
 bit.ly/configuring-memento www.rdfhdt.org linkeddatafragments.org
 mementoweb.org Software Documentation and specification fragments.mementodepot.org Query the DBpedia archive on
  • 42. A sweet affordable combo for Linked Data Archives @Miel_vds
 Herbert Van de Sompel
 Harihar Shankar 
 Lyudmila Balakireva
 Ruben Verborgh