Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building materialised views for linked data systems using microservices

242 views

Published on

In the BBC’s Content Distribution Services division, we build and maintain systems that provide content metadata to a wide range of audience-facing products.

Our current architecture for distributing tagging metadata consists mainly of two RDF-based read and write APIs feeding off a central triplestore. This single storage setup for all operations imposes restrictions on performance and scalability.

I will talk about our work done to create an event-driven distribution pipeline that generates materialised views of tagging metadata.

The new microservices architecture comprises of small, single-purpose services, lambda functions, event stores, queues, streams etc. The views are built on data stores that are optimised to serve specific query profiles thus improving the overall performance and scalability of the system.

Published in: Technology
  • Be the first to comment

Building materialised views for linked data systems using microservices

  1. 1. Building materialised views of linked data systems using microservices Augustine Kwanashie
  2. 2. Outline o  Introduction o  Current architecture and challenges o  Building materialised views o  Other things to consider
  3. 3. Publish Distribute Publish and distribute content metadata
  4. 4. 1 I’ll try to create like Beckham 01-07-2018:19:01:01 urn:cps:1289394 3 2 about English Football Team http://wikidata.org/123 locator Kieran Trippier label Metadata on Tagging
  5. 5. Top 10 Articles About “English Football Team” Ordered by date published
  6. 6. Simplified Architecture Write API Triplestore Read API DistributionSystems EditorialSystems
  7. 7. SPARQL endpoints Performance and Data Integrity Flexibility and Pace of Innovation Custom APIs
  8. 8. Projected performance by 2019 60% 99 percentile response time 100% data volume
  9. 9. So what do we know about the API requests?
  10. 10. We can group API requests by their query profiles
  11. 11. Query by identifier CONSTRUCT { . . . } WHERE { <urn:01> a core:Atricle . . . . }
  12. 12. Query with filters CONSTRUCT { . . . } WHERE { ?id property1 <urn:01> . ?id property2 "value2" . . . . }
  13. 13. Multi-hop query CONSTRUCT { . . . } WHERE { ?id1 <urn:property1> ?id2 ?id2 <urn:property2> ?id3 ?id3 <urn:property3> "value3" . }
  14. 14. We can group API requests by their volume and performance requirements
  15. 15. Low volume and performance requirements High volume and performance requirements More complex queries Mostly simple queries
  16. 16. Build views that map closely to query profiles
  17. 17. Target architecture Event Store WriteAPI Publish API Publish API Publish API ReadAPI Distribute Query by ID Multi-hop
  18. 18. The publish pipeline λ View DBIngest View API WriteAPI Read API Data Input Queue ID: 838394 Operation: Create Timestamp: 1540906781999
  19. 19. Send to DLQ is errors persist λ View DBIngest WriteAPI Read API Input Queue Dead Letter Queue
  20. 20. Notify clients of a new ingest λ View DBIngest View API λ SNS Notifier
  21. 21. Verify ingest is successful λ View DBIngest View API λ Verifier Read API Dead Letter Queue
  22. 22. The distribution pipeline Read API Triplestore Router View API View DB 1 ReadAPIs View API View DB 2
  23. 23. Route traffic based on profile and format If request matches { format: "ld+json" query: "?id=<GUID>" } Then route to View 1
  24. 24. Failover to the Triplestore Read API Triplestore Router View API View DB 1 ReadAPIs
  25. 25. Split traffic between Views Read API Triplestore Router View API View DB 1 ReadAPIs 60% of traffic 40% of traffic
  26. 26. What about JOINS?
  27. 27. { ”@id": "urn:article:01", "about": [ "urn:tag:01", "urn:tag:02", … ] } { ”@id": "urn:tag:01", "label": "Nigeria", ”@type": "Place" }
  28. 28. { ”@id": "urn:article:01", "about": [ { ”@id": "urn:tag:01", "label": "Nigeria", ”@type": "Place" }, … ] }
  29. 29. Previously… Write APIs Triplestore Read APIs PUT <urn:article:01> PUT <urn:tag:01> combined data
  30. 30. Join on Writes Publish API ReadAPI Distribute PUT <urn:article:01> PUT <urn:tag:01> custom view for combined data
  31. 31. Join on Reads Publish API Publish API ReadAPI Distribute PUT <urn:article:01> PUT <urn:tag:01> combined data
  32. 32. Other things to consider
  33. 33. Tracking Ontology Changes biz:Company rdf:type owl:Class ; rdfs:comment "A company featured in BBC news"^^xsd:string ; rdfs:isDefinedBy <http://www.bbc.co.uk/ontologies/…> ; rdfs:label "A company featured in BBC news"^^xsd:string ; rdfs:subClassOf core:Organisation .
  34. 34. Tracking Ontology Changes <http://www.bbc.co.uk/things/01#id> a biz:Company ; core:label "Amazon Inc. " . <http://www.bbc.co.uk/things/01#id> a core:Organisation . Generated implicit triples
  35. 35. Single source of truth Publish API Triplestore Ingest Script Query all IDs ID: 838394 Operation: Create Timestamp: 1540906781999
  36. 36. Summary: Using multiple data sources that match specific query types is feasible and beneficial
  37. 37. Thank You

×