Introduction to SDshare
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Introduction to SDshare

on

  • 2,182 views

An introduction to the SDshare protocol for replicating/syndicating Atom feeds of changes in Topic Maps or RDF stores

An introduction to the SDshare protocol for replicating/syndicating Atom feeds of changes in Topic Maps or RDF stores

Statistics

Views

Total Views
2,182
Views on SlideShare
2,181
Embed Views
1

Actions

Likes
1
Downloads
9
Comments
0

1 Embed 1

http://www.thelibrarynews.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Introduction to SDshare Presentation Transcript

  • 1. An introduction to SDshare
    2011-03-15
    Lars Marius Garshol, <larsga@bouvet.no>
    http://twitter.com/larsga
  • 2. Overview of SDshare
  • 3. SDshare
    A protocol for tracking changes in a semantic datastore
    essentially allows clients to keep track of all changes, for replication purposes
    Supports both Topic Maps and RDF
    Based on Atom
    Highly RESTful
    A CEN specification
  • 4. Basic workings
    Server
    Client
    Fragment
    Fragment
    Fragment
    Fragment
    Client pulls these in, updates
    local copy of dataset
    Server publishes fragments representing changes in datastore
    There is, however, more to it than just this
  • 5. What more is needed?
    Support for more than one dataset per server
    this means: more than one fragment stream
    How do clients get started?
    a change feed is nice once you've got a copy of the dataset, but how do you get a copy?
    What if you miss out on some changes and need to restart?
    must be a way to reset local copy
    The protocol supports all this
  • 6. Two new concepts
    Collection
    essentially a dataset inside the server
    exact meaning is not defined in spec
    will generally be a topic map (TMs) or a graph (RDF)
    Snapshot
    a complete copy of a collection at some point in time
  • 7. Feeds in the server
    Snapshot
    Snapshot feed
    Overview feed
    Fragment
    Fragment feed
    Collection feeds
  • 8. An overview feed
    <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare">
    <title>SDshare feeds from localhost</title>
    <updated>2011-03-15T18:55:38Z</updated>
    <author>
    <name>Ontopia SDshare server</name>
    </author>
    <id>http://localhost:8080/sdshare/</id>
    <link href="http://localhost:8080/sdshare/"></link>
    <entry>
    <title>beer.xtm</title>
    <updated>2011-03-15T18:55:38Z</updated>
    <id>http://localhost:8080/sdshare/beer.xtm</id>
    <link href="collection.jsp?topicmap=beer.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link>
    </entry>
    <entry>
    <title>metadata.xtm</title>
    <updated>2011-03-15T18:55:38Z</updated>
    <id>http://localhost:8080/sdshare/metadata.xtm</id>
    <link href="collection.jsp?topicmap=metadata.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link>
    </entry>
    </feed>
  • 9. The snapshot feed
    A list of links to snapshots of the entire dataset (collection)
    The spec doesn't say anything about how and when snapshots are produced
    It's up to implementations to decide how they want to do this
    It makes sense, though, to always have a snapshot for the current state of the dataset
  • 10. Example snapshot feed
    <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare">
    <title>Snapshots feed for beer.xtm</title>
    <updated>2011-03-15T19:12:34Z</updated>
    <author>
    <name>Ontopia SDshare server</name>
    </author>
    <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshots</id>
    <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</sdshare:ServerSrcLocatorPrefix>
    <entry>
    <title>Snapshot of beer.xtm</title>
    <updated>2011-03-15T19:12:34Z</updated>
    <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshot/0</id>
    <link href="snapshot.jsp?topicmap=beer.xtm" type="application/x-tm+xml; version=1.0" rel="alternate"></link>
    </entry>
    </feed>
  • 11. The fragment feed
    For every change in the topic map, there is one fragment
    the granularity of changes is not defined by the spec
    it could be per transaction, or per topic changed
    The fragment is basically a link to a URL that produces a part of the dataset
  • 12. An example fragment feed
    <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare">
    <title>Fragments feed for beer.xtm</title>
    <updated>2011-03-15T19:21:20Z</updated>
    <author>
    <name>Ontopia SDshare server</name>
    </author>
    <id>file:/Users/larsga/data/topicmaps/beer.xtm/fragments</id>
    <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</sdshare:ServerSrcLocatorPrefix>
    <entry>
    <title>Topic with object ID 4521</title>
    <updated>2011-03-15T19:20:03Z</updated>
    <id>file:/Users/larsga/data/topicmaps/beer.xtm/4521/1300216803730</id>
    <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=rdf" type="application/rdf+xml" rel="alternate"/>
    <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=xtm" type="application/x-tm+xml; version=1.0" rel="alternate"/>
    <sdshare:TopicSI>http://psi.example.org/12</sdshare:TopicSI>
    </entry>
    </feed>
  • 13. What is a fragment?
    Essentially, a piece of a topic map
    that is, a complete XTM file that contains only part of a bigger topic map
    typically, most of the topic references will point to topics not in the XTM file
    Downloading more fragments will yield a bigger subset of the topic map
    the automatic merging in Topic Maps will cause the fragments to match up
    Exactly the same applies in RDF
  • 14. An example fragment
    <topicMap xmlns="http://www.topicmaps.org/xtm/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink">
    <topic id="id4521">
    <instanceOf>
    <subjectIndicatorRef xlink:href="http://psi.garshol.priv.no/beer/pub"></subjectIndicatorRef>
    </instanceOf>
    <subjectIdentity>
    <subjectIndicatorRef xlink:href="http://psi.example.org/12"></subjectIndicatorRef>
    <topicRef xlink:href="file:/Users/larsga/data/topicmaps/beer.xtm#id2662"></topicRef>
    </subjectIdentity>
    <baseName>
    <baseNameString>Amundsen Bryggeri og Spiseri</baseNameString>
    </baseName>
    <occurrence>
    <instanceOf>
    <subjectIndicatorRef xlink:href="http://psi.ontopia.net/ontology/latitude"></subjectIndicatorRef>
    </instanceOf>
    <resourceData>59.913816</resourceData>
    </occurrence>
    ...
    </topic>
    ...
    </topicMap>
  • 15. Applying a fragment
    The feed contains a URI prefix
    this is used to create item identifiers tagging statements with their origin
    For each TopicSI find that topic, then
    for each statement, remove matching item identifier
    if statement now has no item identifiers, delete it
    Merge in the received fragment
    then tag all statements in it with matching item identifier
  • 16. Properties of the protocol
    HATEOAS
    uses hypertext principles
    only endpoint is that of the overview feed
    all other URLs available via hypertext
    Applying a fragment is idempotent
    ie: result is the same, no matter how many times you do it
    Loose binding
    very loose binding between server and client
    Supports federation of data
    client can safely merge data from different sources
  • 17. SDshare push
    In normal SDshare data receivers connect to the data source
    basically, they poll the source with GET requests
    However, the receiver is not always allowed to make connections to the source
    SDshare push is designed for this situation
    Solution is a slightly modified protocol
    source POSTs Atom feeds with inline fragments to receipient
    this flips the server/client relationship
    Not part of the spec; unofficial Ontopia extension
  • 18. Uses of SDshare
  • 19. Example use case #1
    Frontend
    Database
    Ontopia
    DB2TM
    JDBC
    Portal
  • 20. Example use case #1
    Service #1
    Frontend
    Database
    Ontopia
    DB2TM
    SDshare
    Ontopia
    SDshare
    Service #3
    Portal
    ESB
  • 21. NRK/Skole today
    Production environment
    Editorial server
    MediaDB
    Prod #1
    Prod #2
    DB2TM
    Export
    JDBC
    JDBC
    nrk-grep.xtm
    Import
    DB server 1
    DB server 2
    Database
    Firewall
    Server
  • 22. NRK/Skole with SDshare push
    Production environment
    SDshare
    PUSH
    Editorial server
    MediaDB
    Prod #1
    Prod #2
    DB2TM
    JDBC
    JDBC
    DB server 1
    DB server 2
    Database
    Firewall
    Server
  • 23. Hafslund
    ERP
    GIS
    CRM
    ...
    UMIC
    Search engine
    Archive
  • 24. Hafslund architecture
    The beauty of this architecture is that SDshare insulates the different systems from one another
    More input systems can be added without hassle
    Any component can be replaced without affecting the others
    Essentially, a plug-and-play architecture
  • 25. A Hafslund problem
    There are too many duplicates in the data
    duplicates within each system
    also duplication across systems
    How to get rid of the duplicates?
    unrealistic to expect cleanup across systems
    So, we build a deduplicator
    and plug it in...
  • 26. DuKe plugged in
    ERP
    GIS
    CRM
    ...
    UMIC
    Search engine
    Dupe Killer
    Archive
  • 27. Implementations
  • 28. Current implementations
    Web3
    both client and server
    Ontopia
    ditto + SDshare push
    Isidorus
    don't know
    Atomico
    server framework only; no actual implementation
  • 29. Ontopia SDshare server
    Event tracker
    taps into event API where it listens for changes
    maintains in-memory list of changes
    writes all changes to disk as well
    removes duplicate changes and discards old changes
    Web application based on tracker
    JSP pages producing feeds and fragments
    one fragment per changed topic, sorted by time
    only a single snapshot of current state of TM
  • 30. Ontopia SDshare client
    Web UI for mgmt
    Pluggable frontends
    Pluggable backends
    Combine at will
    Frontends
    Ontopia: event listener
    SDshare: polls Atom feeds
    Backends
    Ontopia: applies changes to Ontopia locally
    SPARQL: writes changes to RDF repo via SPARUL
    push: pushes changes over SDshare push
    Web UI
    Ontopia events
    Core logic
    Ontopia backend
    SPARQL Update
    SDshare client
    SDshare push
  • 31. Web UI to client
  • 32. Problems with the spec
  • 33. What if many fragments?
    The size of the fragments feed grows enormous
    expensive if polled frequently
    Paging might be one solution
    basically, end of feed contains pointer to more
    "since" parameter might be another
    allows client to say "only show me changes since ..."
    Probably need both in practice
    http://projects.topicmapslab.de/issues/3675
  • 34. Ordering of fragments
    Should the spec require that fragments be ordered?
    not really necessary if all fragment URIs return current state (instead of state at time fragment entry was created)
  • 35. RDF fragment algorithm
    The one given in the spec makes no sense
    Relies on Topic Maps constructs not found in RDF
    Really no way to make use of it
    http://projects.topicmapslab.de/issues/4013
  • 36. Our interpretation
    Server prefix is URI of RDF named graph
    Fragment algorithm therefore becomes
    delete all statements about changed resources
    then add all statements in fragment
    Means each source gets a different graph
  • 37. TopicSL/TopicII
    Currently, topics can only be identified by subject identifier
    but not all topics have one
    Solution
    add elements for subject locators and item identifiers
    http://projects.topicmapslab.de/issues/3667
  • 38. Paging of snapshots?
    What if the snapshot is vast?
    clients probably won't be able to download and store the entire thing in one go
    Could we page the snapshot into fragments?
    Or is there some other solution?
    http://projects.topicmapslab.de/issues/4307
  • 39. How to tell if the fragment feed is complete?
    When reading the fragment feed, how can we tell if there are older fragments that are discarded?
    and how can we tell which fragment was the newest to be thrown away?
    Without this there's no way to know for certain if you've lost fragments if the feed stops before the newest fragment you've got
    and if you're using since it always will stop before the newest fragment...
    Make new sdshare:foo element on feed level for this information?
    http://projects.topicmapslab.de/issues/4308
  • 40. Blank nodes are not supported
    What to do?
    http://projects.topicmapslab.de/issues/4306
  • 41. More information
    SDshare spec
    http://www.egovpt.org/fg/CWA_Part_1b
    SDshare issue tracker
    http://projects.topicmapslab.de/projects/sdshare
    SDshare use cases
    http://www.garshol.priv.no/blog/215.html