Archive integration at Mattilsynet
Bouvet Tech Meetup 2014-06-11
Lars Marius Garshol, larsga@sesam.no, http://twitter.com/...
Archive integrations
A few systems integrated with the archive
– every integration is expensive and painful
Need many more...
WebCruiter integration
3
Very simple project
– integrate WebCruiter with ePhorte
Doing it with RDF because
– it’s much eas...
SESAM principles
4
Base everything on RDF and SDShare feeds
– dynamic flows of structured data
Extracts from data sources ...
W3C standard
– for interchange of structured data
– has query language, schema languages, formats, ...
Essentially a graph...
Architecture
6
WebCruiter
WS
XML in
files
SDShar
e
Oversettelse ePhorteRDF
SDShar
e
SDShar
eOversettelse
SDShar
e
ePhorte ...
SDShare
A protocol for tracking changes in a data source
– essentially allows clients to keep track of all changes, for
re...
Basic workings
Server Client
Frag
men
t
Server publishes fragments
representing changes in
datastore
Client pulls these in...
From WebCruiter to triple store
9
Frag
men
t
Frag
men
t
Frag
men
t
Frag
men
t
XML adapter
SDShare server
Triple store
SDSh...
10
Translation of metadata
11
Title: Søknad om betalingsutsettelse
Process: 384192
Author: 123
Customer:789
Oversetter
Tittel...
How the mapping works
12
Standard RDF vocabulary
– mapping between properties
– traversing properties to add values
– uses...
What’s to be mapped?
13
Department cannot be mapped
– structure in WebCruiter added manually
Users cannot be mapped, eithe...
Data transfer to translation
14
Simply write SPARQL queries to
– produce fragment feed (based on timestamps)
– produce a f...
ePhorte adapter
15
Receives RDF
– introspects the RDF and translates to Java API
– Java API is stubs calling SOAP services...
Configuration
16
WebCruiter
WS
XML in
files
SDShar
e
Oversettelse ePhorteRDF
SDShar
e
SDShar
eOversettelse
SDShar
e
ePhort...
Properties
Adding more object types or properties is
simple
– we just extend the mapping (and maybe
queries)
Data quality ...
The public journal problem
18
Internet
DMZ Secure zone
Oracle
ePhorte
Journal
app
ePhorte
The public journal solution
19
Internet
DMZ Secure zone
Oracle
ePhorte
Journal
app
Oracle
ePhorte
RDFfiltered
SDShare SDSh...
20
Relatively small project, not that many hours
– includes writing reusable ephorte-adapter
– parts of writing the metada...
Questions?
21
Upcoming SlideShare
Loading in...5
×

Archive integration with RDF

561

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
561
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Archive integration with RDF

  1. 1. Archive integration at Mattilsynet Bouvet Tech Meetup 2014-06-11 Lars Marius Garshol, larsga@sesam.no, http://twitter.com/larsga 1
  2. 2. Archive integrations A few systems integrated with the archive – every integration is expensive and painful Need many more integrations – to reduce amount of manual work – hesitation because of cost Consequences of integrations – if archive upgraded, must retest all systems – archive slows down integrated systems – changes to archive structure require rewriting all integrations Arkiv Regelverk Fagsystem #2 Fagsystem #1 Nettsider Rekrut- tering Kvalitets- systemet
  3. 3. WebCruiter integration 3 Very simple project – integrate WebCruiter with ePhorte Doing it with RDF because – it’s much easier and cheaper – want to extend to more integrations later – first step toward new architecture Good example project – because it’s so simple 4
  4. 4. SESAM principles 4 Base everything on RDF and SDShare feeds – dynamic flows of structured data Extracts from data sources do not map to a common model – instead, extract data as they are in the source – later translate to representation needed by consumers – this way, changes in source or target do not spill over to the other No hard bindings from code to data model – code should have no knowledge of the data model – all data model-specific logic should be configuration – makes data changes much easier to handle
  5. 5. W3C standard – for interchange of structured data – has query language, schema languages, formats, ... Essentially a graph database – known as a triple store – like Neo4j or similar – but standardized – and with many extra features Note that databases are schemaless – so this is NoSQL – powerful query language with SPARQL 5 RDF?
  6. 6. Architecture 6 WebCruiter WS XML in files SDShar e Oversettelse ePhorteRDF SDShar e SDShar eOversettelse SDShar e ePhorte adapterHTTP POST HTTP POST SPARQL Update SPARQL Update SPARQL Update external call Bus Boxes in orange are Sesam components
  7. 7. SDShare A protocol for tracking changes in a data source – essentially allows clients to keep track of all changes, for replication purposes – based on Atom and REST Data source can be anything – triple store – relational database – XML files on disk – ... Data flows as RDF – not an absolute must, but it’s how we do things A CEN specification – http://sdshare.org
  8. 8. Basic workings Server Client Frag men t Server publishes fragments representing changes in datastore Client pulls these in, updates local copy of dataset Frag men t Frag men t Frag men t
  9. 9. From WebCruiter to triple store 9 Frag men t Frag men t Frag men t Frag men t XML adapter SDShare server Triple store SDShare client On the server: • XPath queries to map to RDF On the client: • Two URLs
  10. 10. 10
  11. 11. Translation of metadata 11 Title: Søknad om betalingsutsettelse Process: 384192 Author: 123 Customer:789 Oversetter Tittel: Søknad om betalingsutsettelse Sak: 485283 Ansvarlig: 456 Kontakt: 987 Doktype: I Arkivdel: 17 Application Archive Active Directory 12 3 xy z 45 6 789 987
  12. 12. How the mapping works 12 Standard RDF vocabulary – mapping between properties – traversing properties to add values – uses owl:sameAs to map values Java implementation – called metadata-translator (~500 LOC) – uses very simple SDShare push protocol – writes translated data to Virtuoso Supports multiple mappings – configured using graphs so we know which properties and values to translate to
  13. 13. What’s to be mapped? 13 Department cannot be mapped – structure in WebCruiter added manually Users cannot be mapped, either – no common key – solved using Duke Department can be defaulted – in the cases where we know the user WebCruiter ePhorte
  14. 14. Data transfer to translation 14 Simply write SPARQL queries to – produce fragment feed (based on timestamps) – produce a fragment (trivial) – produce a snapshot (trivial) Then configure SDShare client – just requires two URLs – translation receives an HTTP POST with the fragment, then does its job
  15. 15. ePhorte adapter 15 Receives RDF – introspects the RDF and translates to Java API – Java API is stubs calling SOAP services Given <foo> rdf:type <.../MyClass> – it looks up the Java class “MyClass” then instantiates Then, given <foo> <.../prop> “value” – it looks up method “setProp” on MyClass – calls object.setProp(“value”) That’s it – requires translation to produce RDF exactly aligned with Java API – means there’s no code https://github.com/Mattilsynet/arkivgrensesnitt
  16. 16. Configuration 16 WebCruiter WS XML in files SDShar e Oversettelse ePhorteRDF SDShar e SDShar eOversettelse SDShar e ePhorte adapterHTTP POST external call Bus Look, ma, no code! XPath mapping RDF mapping SQL queries SPARQL queries Look, ma, no code! not much code!
  17. 17. Properties Adding more object types or properties is simple – we just extend the mapping (and maybe queries) Data quality improves with more data – if we don’t have the data to translate employees that information gets lost – if the necessary mapping is added later translation improves automagically Adding more systems is very easy – requires more SDShare feeds plus mappings 17
  18. 18. The public journal problem 18 Internet DMZ Secure zone Oracle ePhorte Journal app ePhorte
  19. 19. The public journal solution 19 Internet DMZ Secure zone Oracle ePhorte Journal app Oracle ePhorte RDFfiltered SDShare SDShare
  20. 20. 20 Relatively small project, not that many hours – includes writing reusable ephorte-adapter – parts of writing the metadata translator, too – also the XML adapter – system documentation – automated deploy system based on Jenkins Flexible, simple solution – most of it reusable – actually captures, as a side-effect, information not available in any other system Conclusion
  21. 21. Questions? 21
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×