SlideShare a Scribd company logo
An introduction to SDshare 2011-03-15 Lars Marius Garshol, <larsga@bouvet.no> http://twitter.com/larsga
Overview of SDshare
SDshare A protocol for tracking changes in a semantic datastore essentially allows clients to keep track of all changes, for replication purposes Supports both Topic Maps and RDF Based on Atom Highly RESTful A CEN specification
Basic workings Server Client Fragment Fragment Fragment Fragment Client pulls these in, updates local copy of dataset Server publishes fragments representing changes in datastore There is, however, more to it than just this
What more is needed? Support for more than one dataset per server this means: more than one fragment stream How do clients get started? a change feed is nice once you've got a copy of the dataset, but how do you get a copy? What if you miss out on some changes and need to restart? must be a way to reset local copy The protocol supports all this
Two new concepts Collection essentially a dataset inside the server exact meaning is not defined in spec will generally be a topic map (TMs) or a graph (RDF) Snapshot a complete copy of a collection at some point in time
Feeds in the server Snapshot Snapshot feed Overview feed Fragment Fragment feed Collection feeds
An overview feed <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare">   <title>SDshare feeds from localhost</title>   <updated>2011-03-15T18:55:38Z</updated>   <author>     <name>Ontopia SDshare server</name>   </author>   <id>http://localhost:8080/sdshare/</id>   <link href="http://localhost:8080/sdshare/"></link>   <entry> <title>beer.xtm</title>     <updated>2011-03-15T18:55:38Z</updated>     <id>http://localhost:8080/sdshare/beer.xtm</id> <link href="collection.jsp?topicmap=beer.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link>   </entry>   <entry>  <title>metadata.xtm</title>     <updated>2011-03-15T18:55:38Z</updated>     <id>http://localhost:8080/sdshare/metadata.xtm</id>   <link href="collection.jsp?topicmap=metadata.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link>   </entry> </feed>
The snapshot feed A list of links to snapshots of the entire dataset (collection) The spec doesn't say anything about how and when snapshots are produced It's up to implementations to decide how they want to do this It makes sense, though, to always have a snapshot for the current state of the dataset
Example snapshot feed <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare">   <title>Snapshots feed for beer.xtm</title>   <updated>2011-03-15T19:12:34Z</updated>   <author>     <name>Ontopia SDshare server</name>   </author>   <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshots</id>   <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</sdshare:ServerSrcLocatorPrefix>   <entry>     <title>Snapshot of beer.xtm</title>     <updated>2011-03-15T19:12:34Z</updated>     <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshot/0</id>     <link href="snapshot.jsp?topicmap=beer.xtm" type="application/x-tm+xml; version=1.0" rel="alternate"></link>   </entry> </feed>
The fragment feed For every change in the topic map, there is one fragment the granularity of changes is not defined by the spec it could be per transaction, or per topic changed The fragment is basically a link to a URL that produces a part of the dataset
An example fragment feed <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare">   <title>Fragments feed for beer.xtm</title>   <updated>2011-03-15T19:21:20Z</updated>   <author>     <name>Ontopia SDshare server</name>   </author>   <id>file:/Users/larsga/data/topicmaps/beer.xtm/fragments</id>   <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</sdshare:ServerSrcLocatorPrefix>   <entry>     <title>Topic with object ID 4521</title>     <updated>2011-03-15T19:20:03Z</updated>     <id>file:/Users/larsga/data/topicmaps/beer.xtm/4521/1300216803730</id>     <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=rdf" type="application/rdf+xml" rel="alternate"/>     <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=xtm" type="application/x-tm+xml; version=1.0" rel="alternate"/>     <sdshare:TopicSI>http://psi.example.org/12</sdshare:TopicSI>   </entry> </feed>
What is a fragment? Essentially, a piece of a topic map that is, a complete XTM file that contains only part of a bigger topic map typically, most of the topic references will point to topics not in the XTM file Downloading more fragments will yield a bigger subset of the topic map the automatic merging in Topic Maps will cause the fragments to match up Exactly the same applies in RDF
An example fragment <topicMap xmlns="http://www.topicmaps.org/xtm/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink">   <topic id="id4521">     <instanceOf>       <subjectIndicatorRef xlink:href="http://psi.garshol.priv.no/beer/pub"></subjectIndicatorRef>     </instanceOf>     <subjectIdentity>       <subjectIndicatorRef xlink:href="http://psi.example.org/12"></subjectIndicatorRef>       <topicRef xlink:href="file:/Users/larsga/data/topicmaps/beer.xtm#id2662"></topicRef>     </subjectIdentity>     <baseName>       <baseNameString>Amundsen Bryggeri og Spiseri</baseNameString>     </baseName>     <occurrence>       <instanceOf>         <subjectIndicatorRef xlink:href="http://psi.ontopia.net/ontology/latitude"></subjectIndicatorRef>       </instanceOf>       <resourceData>59.913816</resourceData>     </occurrence>     ...     </topic>     ... </topicMap>
Applying a fragment The feed contains a URI prefix this is used to create item identifiers tagging statements with their origin For each TopicSI find that topic, then for each statement, remove matching item identifier if statement now has no item identifiers, delete it Merge in the received fragment then tag all statements in it with matching item identifier
Properties of the protocol HATEOAS uses hypertext principles only endpoint is that of the overview feed all other URLs available via hypertext Applying a fragment is idempotent ie: result is the same, no matter how many times you do it Loose binding very loose binding between server and client Supports federation of data client can safely merge data from different sources
SDshare push In normal SDshare data receivers connect to the data source basically, they poll the source with GET requests However, the receiver is not always allowed to make connections to the source SDshare push is designed for this situation Solution is a slightly modified protocol source POSTs Atom feeds with inline fragments to receipient this flips the server/client relationship Not part of the spec; unofficial Ontopia extension
Uses of SDshare
Example use case #1 Frontend Database Ontopia DB2TM JDBC Portal
Example use case #1 Service #1 Frontend Database Ontopia DB2TM SDshare Ontopia SDshare Service #3 Portal ESB
NRK/Skole today Production environment Editorial server MediaDB Prod #1 Prod #2 DB2TM Export JDBC JDBC nrk-grep.xtm Import DB server 1 DB server 2 Database Firewall Server
NRK/Skole with SDshare push Production environment SDshare PUSH Editorial server MediaDB Prod #1 Prod #2 DB2TM JDBC JDBC DB server 1 DB server 2 Database Firewall Server
Hafslund ERP GIS CRM ... UMIC Search engine Archive
Hafslund architecture The beauty of this architecture is that SDshare insulates the different systems from one another More input systems can be added without hassle Any component can be replaced without affecting the others Essentially, a plug-and-play architecture
A Hafslund problem There are too many duplicates in the data duplicates within each system also duplication across systems How to get rid of the duplicates? unrealistic to expect cleanup across systems So, we build a deduplicator and plug it in...
DuKe plugged in ERP GIS CRM ... UMIC Search engine Dupe Killer Archive
Implementations
Current implementations Web3 both client and server Ontopia ditto + SDshare push Isidorus don't know Atomico server framework only; no actual implementation
Ontopia SDshare server Event tracker taps into event API where it listens for changes maintains in-memory list of changes writes all changes to disk as well removes duplicate changes and discards old changes Web application based on tracker JSP pages producing feeds and fragments one fragment per changed topic, sorted by time only a single snapshot of current state of TM
Ontopia SDshare client Web UI for mgmt Pluggable frontends Pluggable backends Combine at will Frontends Ontopia: event listener SDshare: polls Atom feeds Backends Ontopia: applies changes to Ontopia locally SPARQL: writes changes to RDF repo via SPARUL push: pushes changes over SDshare push Web UI Ontopia events Core logic Ontopia backend SPARQL Update SDshare client SDshare push
Web UI to client
Problems with the spec
What if many fragments? The size of the fragments feed grows enormous expensive if polled frequently Paging might be one solution basically, end of feed contains pointer to more "since" parameter might be another allows client to say "only show me changes since ..." Probably need both in practice http://projects.topicmapslab.de/issues/3675
Ordering of fragments Should the spec require that fragments be ordered? not really necessary if all fragment URIs return current state (instead of state at time fragment entry was created)
RDF fragment algorithm The one given in the spec makes no sense Relies on Topic Maps constructs not found in RDF Really no way to make use of it http://projects.topicmapslab.de/issues/4013
Our interpretation Server prefix is URI of RDF named graph Fragment algorithm therefore becomes delete all statements about changed resources then add all statements in fragment Means each source gets a different graph
TopicSL/TopicII Currently, topics can only be identified by subject identifier but not all topics have one Solution add elements for subject locators and item identifiers http://projects.topicmapslab.de/issues/3667
Paging of snapshots? What if the snapshot is vast? clients probably won't be able to download and store the entire thing in one go Could we page the snapshot into fragments? Or is there some other solution? http://projects.topicmapslab.de/issues/4307
How to tell if the fragment feed is complete? When reading the fragment feed, how can we tell if there are older fragments that are discarded? and how can we tell which fragment was the newest to be thrown away? Without this there's no way to know for certain if you've lost fragments if the feed stops before the newest fragment you've got and if you're using since it always will stop before the newest fragment... Make new sdshare:foo element on feed level for this information? http://projects.topicmapslab.de/issues/4308
Blank nodes are not supported What to do? http://projects.topicmapslab.de/issues/4306
More information SDshare spec http://www.egovpt.org/fg/CWA_Part_1b SDshare issue tracker http://projects.topicmapslab.de/projects/sdshare SDshare use cases http://www.garshol.priv.no/blog/215.html

More Related Content

Viewers also liked

Eminem
EminemEminem
EminemJustbea
 
Sala de lo Constitucional oficializa fallo sobre reelección presidencial
Sala de lo Constitucional oficializa fallo sobre reelección presidencialSala de lo Constitucional oficializa fallo sobre reelección presidencial
Sala de lo Constitucional oficializa fallo sobre reelección presidencial
Proceso Digital
 
Bienvenido mr
Bienvenido mrBienvenido mr
Bienvenido mryolanda
 
Jornal Cidade - Ano I - Nº 19
Jornal Cidade - Ano I - Nº 19Jornal Cidade - Ano I - Nº 19
Jornal Cidade - Ano I - Nº 19
Jornal Cidade
 
La jurisdicción constitucional es la rama de la justicia que vela por la supr...
La jurisdicción constitucional es la rama de la justicia que vela por la supr...La jurisdicción constitucional es la rama de la justicia que vela por la supr...
La jurisdicción constitucional es la rama de la justicia que vela por la supr...
Luis Angel Cruz García
 
FALLO Marbury vs. Madison
FALLO Marbury vs. Madison FALLO Marbury vs. Madison
FALLO Marbury vs. Madison
Dcho Constitucional Dra Grillo
 
Resumen -marbury_versus_madison_para_lexweb_
Resumen  -marbury_versus_madison_para_lexweb_Resumen  -marbury_versus_madison_para_lexweb_
Resumen -marbury_versus_madison_para_lexweb_
Roger Reynaga Ventocilla
 
Recurso de inaplicabilidad
Recurso de inaplicabilidadRecurso de inaplicabilidad
Recurso de inaplicabilidadGabriela Galaz
 
Patrocinio deportivo
Patrocinio deportivoPatrocinio deportivo
Patrocinio deportivo
josugg
 
Participación democrática(articulo 40 de la constitución colombiana)
Participación democrática(articulo 40 de la constitución colombiana)Participación democrática(articulo 40 de la constitución colombiana)
Participación democrática(articulo 40 de la constitución colombiana)
laura Avila
 
EstadoDerechoyConstitucion
EstadoDerechoyConstitucionEstadoDerechoyConstitucion
EstadoDerechoyConstitucionAndrea Ungaretti
 

Viewers also liked (11)

Eminem
EminemEminem
Eminem
 
Sala de lo Constitucional oficializa fallo sobre reelección presidencial
Sala de lo Constitucional oficializa fallo sobre reelección presidencialSala de lo Constitucional oficializa fallo sobre reelección presidencial
Sala de lo Constitucional oficializa fallo sobre reelección presidencial
 
Bienvenido mr
Bienvenido mrBienvenido mr
Bienvenido mr
 
Jornal Cidade - Ano I - Nº 19
Jornal Cidade - Ano I - Nº 19Jornal Cidade - Ano I - Nº 19
Jornal Cidade - Ano I - Nº 19
 
La jurisdicción constitucional es la rama de la justicia que vela por la supr...
La jurisdicción constitucional es la rama de la justicia que vela por la supr...La jurisdicción constitucional es la rama de la justicia que vela por la supr...
La jurisdicción constitucional es la rama de la justicia que vela por la supr...
 
FALLO Marbury vs. Madison
FALLO Marbury vs. Madison FALLO Marbury vs. Madison
FALLO Marbury vs. Madison
 
Resumen -marbury_versus_madison_para_lexweb_
Resumen  -marbury_versus_madison_para_lexweb_Resumen  -marbury_versus_madison_para_lexweb_
Resumen -marbury_versus_madison_para_lexweb_
 
Recurso de inaplicabilidad
Recurso de inaplicabilidadRecurso de inaplicabilidad
Recurso de inaplicabilidad
 
Patrocinio deportivo
Patrocinio deportivoPatrocinio deportivo
Patrocinio deportivo
 
Participación democrática(articulo 40 de la constitución colombiana)
Participación democrática(articulo 40 de la constitución colombiana)Participación democrática(articulo 40 de la constitución colombiana)
Participación democrática(articulo 40 de la constitución colombiana)
 
EstadoDerechoyConstitucion
EstadoDerechoyConstitucionEstadoDerechoyConstitucion
EstadoDerechoyConstitucion
 

Similar to Introduction to SDshare

DC-2008 Tutorial 3 - Dublin Core and other metadata schemas
DC-2008 Tutorial 3 - Dublin Core and other metadata schemasDC-2008 Tutorial 3 - Dublin Core and other metadata schemas
DC-2008 Tutorial 3 - Dublin Core and other metadata schemas
Mikael Nilsson
 
Web 2.0 Lessonplan Day1
Web 2.0 Lessonplan Day1Web 2.0 Lessonplan Day1
Web 2.0 Lessonplan Day1
Jesse Thomas
 
Sword v2 at UKCoRR
Sword v2 at UKCoRRSword v2 at UKCoRR
Sword v2 at UKCoRR
SWORD Project
 
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
Crossref
 
Slug: A Semantic Web Crawler
Slug: A Semantic Web CrawlerSlug: A Semantic Web Crawler
Slug: A Semantic Web Crawler
Leigh Dodds
 
Catacomb Apachecon Fast Feather 2008
Catacomb Apachecon Fast Feather 2008Catacomb Apachecon Fast Feather 2008
Catacomb Apachecon Fast Feather 2008
Markus Litz
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
Gaurav Verma
 
SPARQLing Services
SPARQLing ServicesSPARQLing Services
SPARQLing Services
Leigh Dodds
 
Terracotta Ch'ti Jug
Terracotta Ch'ti JugTerracotta Ch'ti Jug
Terracotta Ch'ti JugCh'ti JUG
 
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCacheClustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Cris Holdorph
 
Creative Commons @ Seybold San Francisco 2004 - DRM Roundtable
Creative Commons @ Seybold San Francisco 2004 - DRM RoundtableCreative Commons @ Seybold San Francisco 2004 - DRM Roundtable
Creative Commons @ Seybold San Francisco 2004 - DRM RoundtableMike Linksvayer
 
HTTP/2 Introduction
HTTP/2 IntroductionHTTP/2 Introduction
HTTP/2 Introduction
Walter Liu
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
Ryousei Takano
 
Revisiting HTTP/2
Revisiting HTTP/2Revisiting HTTP/2
Revisiting HTTP/2
Fastly
 
Getting Started With The Talis Platform
Getting Started With The Talis PlatformGetting Started With The Talis Platform
Getting Started With The Talis Platform
Leigh Dodds
 
Ibm
IbmIbm
Ibmtechbed
 
Agile Descriptions
Agile DescriptionsAgile Descriptions
Agile DescriptionsTony Hammond
 
11g R2
11g R211g R2
11g R2afa reg
 
Ontopia Code Camp
Ontopia Code CampOntopia Code Camp
Ontopia Code Camp
Lars Marius Garshol
 

Similar to Introduction to SDshare (20)

DC-2008 Tutorial 3 - Dublin Core and other metadata schemas
DC-2008 Tutorial 3 - Dublin Core and other metadata schemasDC-2008 Tutorial 3 - Dublin Core and other metadata schemas
DC-2008 Tutorial 3 - Dublin Core and other metadata schemas
 
Web 2.0 Lessonplan Day1
Web 2.0 Lessonplan Day1Web 2.0 Lessonplan Day1
Web 2.0 Lessonplan Day1
 
Sword v2 at UKCoRR
Sword v2 at UKCoRRSword v2 at UKCoRR
Sword v2 at UKCoRR
 
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
 
Slug: A Semantic Web Crawler
Slug: A Semantic Web CrawlerSlug: A Semantic Web Crawler
Slug: A Semantic Web Crawler
 
Catacomb Apachecon Fast Feather 2008
Catacomb Apachecon Fast Feather 2008Catacomb Apachecon Fast Feather 2008
Catacomb Apachecon Fast Feather 2008
 
RESTFul IDEAS
RESTFul IDEASRESTFul IDEAS
RESTFul IDEAS
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
SPARQLing Services
SPARQLing ServicesSPARQLing Services
SPARQLing Services
 
Terracotta Ch'ti Jug
Terracotta Ch'ti JugTerracotta Ch'ti Jug
Terracotta Ch'ti Jug
 
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCacheClustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
 
Creative Commons @ Seybold San Francisco 2004 - DRM Roundtable
Creative Commons @ Seybold San Francisco 2004 - DRM RoundtableCreative Commons @ Seybold San Francisco 2004 - DRM Roundtable
Creative Commons @ Seybold San Francisco 2004 - DRM Roundtable
 
HTTP/2 Introduction
HTTP/2 IntroductionHTTP/2 Introduction
HTTP/2 Introduction
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
Revisiting HTTP/2
Revisiting HTTP/2Revisiting HTTP/2
Revisiting HTTP/2
 
Getting Started With The Talis Platform
Getting Started With The Talis PlatformGetting Started With The Talis Platform
Getting Started With The Talis Platform
 
Ibm
IbmIbm
Ibm
 
Agile Descriptions
Agile DescriptionsAgile Descriptions
Agile Descriptions
 
11g R2
11g R211g R2
11g R2
 
Ontopia Code Camp
Ontopia Code CampOntopia Code Camp
Ontopia Code Camp
 

More from Lars Marius Garshol

JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformation
Lars Marius Garshol
 
Data collection in AWS at Schibsted
Data collection in AWS at SchibstedData collection in AWS at Schibsted
Data collection in AWS at Schibsted
Lars Marius Garshol
 
Kveik - what is it?
Kveik - what is it?Kveik - what is it?
Kveik - what is it?
Lars Marius Garshol
 
Nature-inspired algorithms
Nature-inspired algorithmsNature-inspired algorithms
Nature-inspired algorithms
Lars Marius Garshol
 
Collecting 600M events/day
Collecting 600M events/dayCollecting 600M events/day
Collecting 600M events/day
Lars Marius Garshol
 
History of writing
History of writingHistory of writing
History of writing
Lars Marius Garshol
 
NoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityNoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativity
Lars Marius Garshol
 
Norwegian farmhouse ale
Norwegian farmhouse aleNorwegian farmhouse ale
Norwegian farmhouse ale
Lars Marius Garshol
 
Archive integration with RDF
Archive integration with RDFArchive integration with RDF
Archive integration with RDFLars Marius Garshol
 
The Euro crisis in 10 minutes
The Euro crisis in 10 minutesThe Euro crisis in 10 minutes
The Euro crisis in 10 minutes
Lars Marius Garshol
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
Lars Marius Garshol
 
Linked Open Data for the Cultural Sector
Linked Open Data for the Cultural SectorLinked Open Data for the Cultural Sector
Linked Open Data for the Cultural Sector
Lars Marius Garshol
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
Lars Marius Garshol
 
Bitcoin - digital gold
Bitcoin - digital goldBitcoin - digital gold
Bitcoin - digital gold
Lars Marius Garshol
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 
Hops - the green gold
Hops - the green goldHops - the green gold
Hops - the green gold
Lars Marius Garshol
 
Big data 101
Big data 101Big data 101
Big data 101
Lars Marius Garshol
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
Lars Marius Garshol
 
Hafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practiceHafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practiceLars Marius Garshol
 
Approximate string comparators
Approximate string comparatorsApproximate string comparators
Approximate string comparators
Lars Marius Garshol
 

More from Lars Marius Garshol (20)

JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformation
 
Data collection in AWS at Schibsted
Data collection in AWS at SchibstedData collection in AWS at Schibsted
Data collection in AWS at Schibsted
 
Kveik - what is it?
Kveik - what is it?Kveik - what is it?
Kveik - what is it?
 
Nature-inspired algorithms
Nature-inspired algorithmsNature-inspired algorithms
Nature-inspired algorithms
 
Collecting 600M events/day
Collecting 600M events/dayCollecting 600M events/day
Collecting 600M events/day
 
History of writing
History of writingHistory of writing
History of writing
 
NoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityNoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativity
 
Norwegian farmhouse ale
Norwegian farmhouse aleNorwegian farmhouse ale
Norwegian farmhouse ale
 
Archive integration with RDF
Archive integration with RDFArchive integration with RDF
Archive integration with RDF
 
The Euro crisis in 10 minutes
The Euro crisis in 10 minutesThe Euro crisis in 10 minutes
The Euro crisis in 10 minutes
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
Linked Open Data for the Cultural Sector
Linked Open Data for the Cultural SectorLinked Open Data for the Cultural Sector
Linked Open Data for the Cultural Sector
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
 
Bitcoin - digital gold
Bitcoin - digital goldBitcoin - digital gold
Bitcoin - digital gold
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Hops - the green gold
Hops - the green goldHops - the green gold
Hops - the green gold
 
Big data 101
Big data 101Big data 101
Big data 101
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Hafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practiceHafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practice
 
Approximate string comparators
Approximate string comparatorsApproximate string comparators
Approximate string comparators
 

Recently uploaded

UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 

Introduction to SDshare

  • 1. An introduction to SDshare 2011-03-15 Lars Marius Garshol, <larsga@bouvet.no> http://twitter.com/larsga
  • 3. SDshare A protocol for tracking changes in a semantic datastore essentially allows clients to keep track of all changes, for replication purposes Supports both Topic Maps and RDF Based on Atom Highly RESTful A CEN specification
  • 4. Basic workings Server Client Fragment Fragment Fragment Fragment Client pulls these in, updates local copy of dataset Server publishes fragments representing changes in datastore There is, however, more to it than just this
  • 5. What more is needed? Support for more than one dataset per server this means: more than one fragment stream How do clients get started? a change feed is nice once you've got a copy of the dataset, but how do you get a copy? What if you miss out on some changes and need to restart? must be a way to reset local copy The protocol supports all this
  • 6. Two new concepts Collection essentially a dataset inside the server exact meaning is not defined in spec will generally be a topic map (TMs) or a graph (RDF) Snapshot a complete copy of a collection at some point in time
  • 7. Feeds in the server Snapshot Snapshot feed Overview feed Fragment Fragment feed Collection feeds
  • 8. An overview feed <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare"> <title>SDshare feeds from localhost</title> <updated>2011-03-15T18:55:38Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>http://localhost:8080/sdshare/</id> <link href="http://localhost:8080/sdshare/"></link> <entry> <title>beer.xtm</title> <updated>2011-03-15T18:55:38Z</updated> <id>http://localhost:8080/sdshare/beer.xtm</id> <link href="collection.jsp?topicmap=beer.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link> </entry> <entry> <title>metadata.xtm</title> <updated>2011-03-15T18:55:38Z</updated> <id>http://localhost:8080/sdshare/metadata.xtm</id> <link href="collection.jsp?topicmap=metadata.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link> </entry> </feed>
  • 9. The snapshot feed A list of links to snapshots of the entire dataset (collection) The spec doesn't say anything about how and when snapshots are produced It's up to implementations to decide how they want to do this It makes sense, though, to always have a snapshot for the current state of the dataset
  • 10. Example snapshot feed <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare"> <title>Snapshots feed for beer.xtm</title> <updated>2011-03-15T19:12:34Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshots</id> <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</sdshare:ServerSrcLocatorPrefix> <entry> <title>Snapshot of beer.xtm</title> <updated>2011-03-15T19:12:34Z</updated> <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshot/0</id> <link href="snapshot.jsp?topicmap=beer.xtm" type="application/x-tm+xml; version=1.0" rel="alternate"></link> </entry> </feed>
  • 11. The fragment feed For every change in the topic map, there is one fragment the granularity of changes is not defined by the spec it could be per transaction, or per topic changed The fragment is basically a link to a URL that produces a part of the dataset
  • 12. An example fragment feed <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare"> <title>Fragments feed for beer.xtm</title> <updated>2011-03-15T19:21:20Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>file:/Users/larsga/data/topicmaps/beer.xtm/fragments</id> <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</sdshare:ServerSrcLocatorPrefix> <entry> <title>Topic with object ID 4521</title> <updated>2011-03-15T19:20:03Z</updated> <id>file:/Users/larsga/data/topicmaps/beer.xtm/4521/1300216803730</id> <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=rdf" type="application/rdf+xml" rel="alternate"/> <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=xtm" type="application/x-tm+xml; version=1.0" rel="alternate"/> <sdshare:TopicSI>http://psi.example.org/12</sdshare:TopicSI> </entry> </feed>
  • 13. What is a fragment? Essentially, a piece of a topic map that is, a complete XTM file that contains only part of a bigger topic map typically, most of the topic references will point to topics not in the XTM file Downloading more fragments will yield a bigger subset of the topic map the automatic merging in Topic Maps will cause the fragments to match up Exactly the same applies in RDF
  • 14. An example fragment <topicMap xmlns="http://www.topicmaps.org/xtm/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink"> <topic id="id4521"> <instanceOf> <subjectIndicatorRef xlink:href="http://psi.garshol.priv.no/beer/pub"></subjectIndicatorRef> </instanceOf> <subjectIdentity> <subjectIndicatorRef xlink:href="http://psi.example.org/12"></subjectIndicatorRef> <topicRef xlink:href="file:/Users/larsga/data/topicmaps/beer.xtm#id2662"></topicRef> </subjectIdentity> <baseName> <baseNameString>Amundsen Bryggeri og Spiseri</baseNameString> </baseName> <occurrence> <instanceOf> <subjectIndicatorRef xlink:href="http://psi.ontopia.net/ontology/latitude"></subjectIndicatorRef> </instanceOf> <resourceData>59.913816</resourceData> </occurrence> ... </topic> ... </topicMap>
  • 15. Applying a fragment The feed contains a URI prefix this is used to create item identifiers tagging statements with their origin For each TopicSI find that topic, then for each statement, remove matching item identifier if statement now has no item identifiers, delete it Merge in the received fragment then tag all statements in it with matching item identifier
  • 16. Properties of the protocol HATEOAS uses hypertext principles only endpoint is that of the overview feed all other URLs available via hypertext Applying a fragment is idempotent ie: result is the same, no matter how many times you do it Loose binding very loose binding between server and client Supports federation of data client can safely merge data from different sources
  • 17. SDshare push In normal SDshare data receivers connect to the data source basically, they poll the source with GET requests However, the receiver is not always allowed to make connections to the source SDshare push is designed for this situation Solution is a slightly modified protocol source POSTs Atom feeds with inline fragments to receipient this flips the server/client relationship Not part of the spec; unofficial Ontopia extension
  • 19. Example use case #1 Frontend Database Ontopia DB2TM JDBC Portal
  • 20. Example use case #1 Service #1 Frontend Database Ontopia DB2TM SDshare Ontopia SDshare Service #3 Portal ESB
  • 21. NRK/Skole today Production environment Editorial server MediaDB Prod #1 Prod #2 DB2TM Export JDBC JDBC nrk-grep.xtm Import DB server 1 DB server 2 Database Firewall Server
  • 22. NRK/Skole with SDshare push Production environment SDshare PUSH Editorial server MediaDB Prod #1 Prod #2 DB2TM JDBC JDBC DB server 1 DB server 2 Database Firewall Server
  • 23. Hafslund ERP GIS CRM ... UMIC Search engine Archive
  • 24. Hafslund architecture The beauty of this architecture is that SDshare insulates the different systems from one another More input systems can be added without hassle Any component can be replaced without affecting the others Essentially, a plug-and-play architecture
  • 25. A Hafslund problem There are too many duplicates in the data duplicates within each system also duplication across systems How to get rid of the duplicates? unrealistic to expect cleanup across systems So, we build a deduplicator and plug it in...
  • 26. DuKe plugged in ERP GIS CRM ... UMIC Search engine Dupe Killer Archive
  • 28. Current implementations Web3 both client and server Ontopia ditto + SDshare push Isidorus don't know Atomico server framework only; no actual implementation
  • 29. Ontopia SDshare server Event tracker taps into event API where it listens for changes maintains in-memory list of changes writes all changes to disk as well removes duplicate changes and discards old changes Web application based on tracker JSP pages producing feeds and fragments one fragment per changed topic, sorted by time only a single snapshot of current state of TM
  • 30. Ontopia SDshare client Web UI for mgmt Pluggable frontends Pluggable backends Combine at will Frontends Ontopia: event listener SDshare: polls Atom feeds Backends Ontopia: applies changes to Ontopia locally SPARQL: writes changes to RDF repo via SPARUL push: pushes changes over SDshare push Web UI Ontopia events Core logic Ontopia backend SPARQL Update SDshare client SDshare push
  • 31. Web UI to client
  • 33. What if many fragments? The size of the fragments feed grows enormous expensive if polled frequently Paging might be one solution basically, end of feed contains pointer to more "since" parameter might be another allows client to say "only show me changes since ..." Probably need both in practice http://projects.topicmapslab.de/issues/3675
  • 34. Ordering of fragments Should the spec require that fragments be ordered? not really necessary if all fragment URIs return current state (instead of state at time fragment entry was created)
  • 35. RDF fragment algorithm The one given in the spec makes no sense Relies on Topic Maps constructs not found in RDF Really no way to make use of it http://projects.topicmapslab.de/issues/4013
  • 36. Our interpretation Server prefix is URI of RDF named graph Fragment algorithm therefore becomes delete all statements about changed resources then add all statements in fragment Means each source gets a different graph
  • 37. TopicSL/TopicII Currently, topics can only be identified by subject identifier but not all topics have one Solution add elements for subject locators and item identifiers http://projects.topicmapslab.de/issues/3667
  • 38. Paging of snapshots? What if the snapshot is vast? clients probably won't be able to download and store the entire thing in one go Could we page the snapshot into fragments? Or is there some other solution? http://projects.topicmapslab.de/issues/4307
  • 39. How to tell if the fragment feed is complete? When reading the fragment feed, how can we tell if there are older fragments that are discarded? and how can we tell which fragment was the newest to be thrown away? Without this there's no way to know for certain if you've lost fragments if the feed stops before the newest fragment you've got and if you're using since it always will stop before the newest fragment... Make new sdshare:foo element on feed level for this information? http://projects.topicmapslab.de/issues/4308
  • 40. Blank nodes are not supported What to do? http://projects.topicmapslab.de/issues/4306
  • 41. More information SDshare spec http://www.egovpt.org/fg/CWA_Part_1b SDshare issue tracker http://projects.topicmapslab.de/projects/sdshare SDshare use cases http://www.garshol.priv.no/blog/215.html