Introduction to SDshare


Published on

An introduction to the SDshare protocol for replicating/syndicating Atom feeds of changes in Topic Maps or RDF stores

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Introduction to SDshare

  1. 1. An introduction to SDshare<br />2011-03-15<br />Lars Marius Garshol, <><br /><br />
  2. 2. Overview of SDshare<br />
  3. 3. SDshare<br />A protocol for tracking changes in a semantic datastore<br />essentially allows clients to keep track of all changes, for replication purposes<br />Supports both Topic Maps and RDF<br />Based on Atom<br />Highly RESTful<br />A CEN specification<br />
  4. 4. Basic workings<br />Server<br />Client<br />Fragment<br />Fragment<br />Fragment<br />Fragment<br />Client pulls these in, updates<br />local copy of dataset<br />Server publishes fragments representing changes in datastore<br />There is, however, more to it than just this<br />
  5. 5. What more is needed?<br />Support for more than one dataset per server<br />this means: more than one fragment stream<br />How do clients get started?<br />a change feed is nice once you've got a copy of the dataset, but how do you get a copy?<br />What if you miss out on some changes and need to restart?<br />must be a way to reset local copy<br />The protocol supports all this<br />
  6. 6. Two new concepts<br />Collection<br />essentially a dataset inside the server<br />exact meaning is not defined in spec<br />will generally be a topic map (TMs) or a graph (RDF)<br />Snapshot<br />a complete copy of a collection at some point in time<br />
  7. 7. Feeds in the server<br />Snapshot<br />Snapshot feed<br />Overview feed<br />Fragment<br />Fragment feed<br />Collection feeds<br />
  8. 8. An overview feed<br /><feed xmlns="" xmlns:sdshare=""><br /> <title>SDshare feeds from localhost</title><br /> <updated>2011-03-15T18:55:38Z</updated><br /> <author><br /> <name>Ontopia SDshare server</name><br /> </author><br /> <id>http://localhost:8080/sdshare/</id><br /> <link href="http://localhost:8080/sdshare/"></link><br /> <entry><br /><title>beer.xtm</title><br /> <updated>2011-03-15T18:55:38Z</updated><br /> <id>http://localhost:8080/sdshare/beer.xtm</id><br /><link href="collection.jsp?topicmap=beer.xtm" type="application/atom+xml" rel=""></link><br /> </entry><br /> <entry><br /> <title>metadata.xtm</title><br /> <updated>2011-03-15T18:55:38Z</updated><br /> <id>http://localhost:8080/sdshare/metadata.xtm</id><br /> <link href="collection.jsp?topicmap=metadata.xtm" type="application/atom+xml" rel=""></link><br /> </entry><br /></feed><br />
  9. 9. The snapshot feed<br />A list of links to snapshots of the entire dataset (collection)<br />The spec doesn't say anything about how and when snapshots are produced<br />It's up to implementations to decide how they want to do this<br />It makes sense, though, to always have a snapshot for the current state of the dataset<br />
  10. 10. Example snapshot feed<br /><feed xmlns="" xmlns:sdshare=""><br /> <title>Snapshots feed for beer.xtm</title><br /> <updated>2011-03-15T19:12:34Z</updated><br /> <author><br /> <name>Ontopia SDshare server</name><br /> </author><br /> <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshots</id><br /> <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</sdshare:ServerSrcLocatorPrefix><br /> <entry><br /> <title>Snapshot of beer.xtm</title><br /> <updated>2011-03-15T19:12:34Z</updated><br /> <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshot/0</id><br /> <link href="snapshot.jsp?topicmap=beer.xtm" type="application/x-tm+xml; version=1.0" rel="alternate"></link><br /> </entry><br /></feed><br />
  11. 11. The fragment feed<br />For every change in the topic map, there is one fragment<br />the granularity of changes is not defined by the spec<br />it could be per transaction, or per topic changed<br />The fragment is basically a link to a URL that produces a part of the dataset<br />
  12. 12. An example fragment feed<br /><feed xmlns="" xmlns:sdshare=""><br /> <title>Fragments feed for beer.xtm</title><br /> <updated>2011-03-15T19:21:20Z</updated><br /> <author><br /> <name>Ontopia SDshare server</name><br /> </author><br /> <id>file:/Users/larsga/data/topicmaps/beer.xtm/fragments</id><br /> <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</sdshare:ServerSrcLocatorPrefix><br /> <entry><br /> <title>Topic with object ID 4521</title><br /> <updated>2011-03-15T19:20:03Z</updated><br /> <id>file:/Users/larsga/data/topicmaps/beer.xtm/4521/1300216803730</id><br /> <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=rdf" type="application/rdf+xml" rel="alternate"/><br /> <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=xtm" type="application/x-tm+xml; version=1.0" rel="alternate"/><br /> <sdshare:TopicSI></sdshare:TopicSI><br /> </entry><br /></feed><br />
  13. 13. What is a fragment?<br />Essentially, a piece of a topic map<br />that is, a complete XTM file that contains only part of a bigger topic map<br />typically, most of the topic references will point to topics not in the XTM file<br />Downloading more fragments will yield a bigger subset of the topic map<br />the automatic merging in Topic Maps will cause the fragments to match up<br />Exactly the same applies in RDF<br />
  14. 14. An example fragment<br /><topicMap xmlns="" xmlns:xlink=""><br /> <topic id="id4521"><br /> <instanceOf><br /> <subjectIndicatorRef xlink:href=""></subjectIndicatorRef><br /> </instanceOf><br /> <subjectIdentity><br /> <subjectIndicatorRef xlink:href=""></subjectIndicatorRef><br /> <topicRef xlink:href="file:/Users/larsga/data/topicmaps/beer.xtm#id2662"></topicRef><br /> </subjectIdentity><br /> <baseName><br /> <baseNameString>Amundsen Bryggeri og Spiseri</baseNameString><br /> </baseName><br /> <occurrence><br /> <instanceOf><br /> <subjectIndicatorRef xlink:href=""></subjectIndicatorRef><br /> </instanceOf><br /> <resourceData>59.913816</resourceData><br /> </occurrence><br /> ...<br /> </topic><br /> ...<br /></topicMap><br />
  15. 15. Applying a fragment<br />The feed contains a URI prefix<br />this is used to create item identifiers tagging statements with their origin<br />For each TopicSI find that topic, then<br />for each statement, remove matching item identifier<br />if statement now has no item identifiers, delete it<br />Merge in the received fragment<br />then tag all statements in it with matching item identifier<br />
  16. 16. Properties of the protocol<br />HATEOAS<br />uses hypertext principles<br />only endpoint is that of the overview feed<br />all other URLs available via hypertext<br />Applying a fragment is idempotent<br />ie: result is the same, no matter how many times you do it<br />Loose binding<br />very loose binding between server and client<br />Supports federation of data<br />client can safely merge data from different sources<br />
  17. 17. SDshare push<br />In normal SDshare data receivers connect to the data source<br />basically, they poll the source with GET requests<br />However, the receiver is not always allowed to make connections to the source<br />SDshare push is designed for this situation<br />Solution is a slightly modified protocol<br />source POSTs Atom feeds with inline fragments to receipient<br />this flips the server/client relationship<br />Not part of the spec; unofficial Ontopia extension<br />
  18. 18. Uses of SDshare<br />
  19. 19. Example use case #1<br />Frontend<br />Database<br />Ontopia<br />DB2TM<br />JDBC<br />Portal<br />
  20. 20. Example use case #1<br />Service #1<br />Frontend<br />Database<br />Ontopia<br />DB2TM<br />SDshare<br />Ontopia<br />SDshare<br />Service #3<br />Portal<br />ESB<br />
  21. 21. NRK/Skole today<br />Production environment<br />Editorial server<br />MediaDB<br />Prod #1<br />Prod #2<br />DB2TM<br />Export<br />JDBC<br />JDBC<br />nrk-grep.xtm<br />Import<br />DB server 1<br />DB server 2<br />Database<br />Firewall<br />Server<br />
  22. 22. NRK/Skole with SDshare push<br />Production environment<br />SDshare<br />PUSH<br />Editorial server<br />MediaDB<br />Prod #1<br />Prod #2<br />DB2TM<br />JDBC<br />JDBC<br />DB server 1<br />DB server 2<br />Database<br />Firewall<br />Server<br />
  23. 23. Hafslund<br />ERP<br />GIS<br />CRM<br />...<br />UMIC<br />Search engine<br />Archive<br />
  24. 24. Hafslund architecture<br />The beauty of this architecture is that SDshare insulates the different systems from one another<br />More input systems can be added without hassle<br />Any component can be replaced without affecting the others<br />Essentially, a plug-and-play architecture<br />
  25. 25. A Hafslund problem<br />There are too many duplicates in the data<br />duplicates within each system<br />also duplication across systems<br />How to get rid of the duplicates?<br />unrealistic to expect cleanup across systems<br />So, we build a deduplicator<br />and plug it in...<br />
  26. 26. DuKe plugged in<br />ERP<br />GIS<br />CRM<br />...<br />UMIC<br />Search engine<br />Dupe Killer<br />Archive<br />
  27. 27. Implementations<br />
  28. 28. Current implementations<br />Web3<br />both client and server<br />Ontopia<br />ditto + SDshare push<br />Isidorus<br />don't know<br />Atomico<br />server framework only; no actual implementation<br />
  29. 29. Ontopia SDshare server<br />Event tracker<br />taps into event API where it listens for changes<br />maintains in-memory list of changes<br />writes all changes to disk as well<br />removes duplicate changes and discards old changes<br />Web application based on tracker<br />JSP pages producing feeds and fragments<br />one fragment per changed topic, sorted by time<br />only a single snapshot of current state of TM<br />
  30. 30. Ontopia SDshare client<br />Web UI for mgmt<br />Pluggable frontends<br />Pluggable backends<br />Combine at will<br />Frontends<br />Ontopia: event listener<br />SDshare: polls Atom feeds<br />Backends<br />Ontopia: applies changes to Ontopia locally<br />SPARQL: writes changes to RDF repo via SPARUL<br />push: pushes changes over SDshare push<br />Web UI<br />Ontopia events<br />Core logic<br />Ontopia backend<br />SPARQL Update<br />SDshare client<br />SDshare push<br />
  31. 31. Web UI to client<br />
  32. 32. Problems with the spec<br />
  33. 33. What if many fragments?<br />The size of the fragments feed grows enormous<br />expensive if polled frequently<br />Paging might be one solution<br />basically, end of feed contains pointer to more<br />"since" parameter might be another<br />allows client to say "only show me changes since ..."<br />Probably need both in practice<br /><br />
  34. 34. Ordering of fragments<br />Should the spec require that fragments be ordered?<br />not really necessary if all fragment URIs return current state (instead of state at time fragment entry was created)<br />
  35. 35. RDF fragment algorithm<br />The one given in the spec makes no sense<br />Relies on Topic Maps constructs not found in RDF<br />Really no way to make use of it<br /><br />
  36. 36. Our interpretation<br />Server prefix is URI of RDF named graph<br />Fragment algorithm therefore becomes<br />delete all statements about changed resources<br />then add all statements in fragment<br />Means each source gets a different graph<br />
  37. 37. TopicSL/TopicII<br />Currently, topics can only be identified by subject identifier<br />but not all topics have one<br />Solution<br />add elements for subject locators and item identifiers<br /><br />
  38. 38. Paging of snapshots?<br />What if the snapshot is vast?<br />clients probably won't be able to download and store the entire thing in one go<br />Could we page the snapshot into fragments?<br />Or is there some other solution?<br /><br />
  39. 39. How to tell if the fragment feed is complete?<br />When reading the fragment feed, how can we tell if there are older fragments that are discarded?<br />and how can we tell which fragment was the newest to be thrown away?<br />Without this there's no way to know for certain if you've lost fragments if the feed stops before the newest fragment you've got<br />and if you're using since it always will stop before the newest fragment...<br />Make new sdshare:foo element on feed level for this information?<br /><br />
  40. 40. Blank nodes are not supported<br />What to do?<br /><br />
  41. 41. More information<br />SDshare spec<br /><br />SDshare issue tracker<br /><br />SDshare use cases<br /><br />