Building materialised views of linked
data systems using microservices
Augustine Kwanashie
Outline
o  Introduction
o  Current architecture and challenges
o  Building materialised views
o  Other things to consider
Publish Distribute
Publish and distribute content metadata
1
I’ll try to create like
Beckham 01-07-2018:19:01:01
urn:cps:1289394
3
2
about
English Football Team
http://wikidata.org/123
locator
Kieran
Trippier
label
Metadata on Tagging
Top 10 Articles
About “English Football Team”
Ordered by date published
Simplified Architecture
Write API
Triplestore
Read API
DistributionSystems
EditorialSystems
SPARQL endpoints
Performance and
Data Integrity
Flexibility and Pace
of Innovation
Custom APIs
Projected performance by 2019
60%
99 percentile response
time
100%
data volume
So what do we know about the API
requests?
We can group API requests by their
query profiles
Query by identifier
CONSTRUCT {
. . .
} WHERE {
<urn:01> a core:Atricle .
. . .
}
Query with filters
CONSTRUCT {
. . .
} WHERE {
?id property1 <urn:01> .
?id property2 "value2" .
. . .
}
Multi-hop query
CONSTRUCT {
. . .
} WHERE {
?id1 <urn:property1> ?id2
?id2 <urn:property2> ?id3
?id3 <urn:property3> "value3" .
}
We can group API requests by their
volume and performance
requirements
Low volume and
performance
requirements
High volume and
performance
requirements
More complex
queries
Mostly simple
queries
Build views that map closely to
query profiles
Target architecture
Event
Store
WriteAPI
Publish API
Publish API
Publish API
ReadAPI
Distribute
Query by ID
Multi-hop
The publish pipeline
λ
View DBIngest
View API
WriteAPI
Read API
Data
Input Queue
ID: 838394
Operation: Create
Timestamp: 1540906781999
Send to DLQ is errors persist
λ
View DBIngest
WriteAPI
Read API
Input Queue
Dead Letter Queue
Notify clients of a new ingest
λ
View DBIngest
View API
λ
SNS
Notifier
Verify ingest is successful
λ
View DBIngest
View API
λ
Verifier
Read API
Dead Letter Queue
The distribution pipeline
Read API
Triplestore
Router
View API
View DB 1
ReadAPIs
View API
View DB 2
Route traffic based on profile and format
If request matches {
format: "ld+json"
query: "?id=<GUID>"
}
Then route to View 1
Failover to the Triplestore
Read API
Triplestore
Router
View API
View DB 1
ReadAPIs
Split traffic between Views
Read API
Triplestore
Router
View API
View DB 1
ReadAPIs
60% of traffic
40% of traffic
What about JOINS?
{
”@id": "urn:article:01",
"about": [
"urn:tag:01",
"urn:tag:02",
…
]
}
{
”@id": "urn:tag:01",
"label": "Nigeria",
”@type": "Place"
}
{
”@id": "urn:article:01",
"about": [
{
”@id": "urn:tag:01",
"label": "Nigeria",
”@type": "Place"
},
…
]
}
Previously…
Write APIs
Triplestore
Read APIs
PUT <urn:article:01>
PUT <urn:tag:01>
combined data
Join on Writes
Publish API
ReadAPI
Distribute
PUT <urn:article:01>
PUT <urn:tag:01>
custom view for
combined data
Join on Reads
Publish API
Publish API
ReadAPI
Distribute
PUT <urn:article:01>
PUT <urn:tag:01>
combined data
Other things to consider
Tracking Ontology Changes
biz:Company
rdf:type owl:Class ;
rdfs:comment "A company featured in BBC news"^^xsd:string ;
rdfs:isDefinedBy <http://www.bbc.co.uk/ontologies/…> ;
rdfs:label "A company featured in BBC news"^^xsd:string ;
rdfs:subClassOf core:Organisation .
Tracking Ontology Changes
<http://www.bbc.co.uk/things/01#id> a biz:Company ;
core:label "Amazon Inc. " .
<http://www.bbc.co.uk/things/01#id> a core:Organisation .
Generated implicit triples
Single source of truth
Publish API
Triplestore
Ingest Script
Query all IDs
ID: 838394
Operation: Create
Timestamp: 1540906781999
Summary:
Using multiple data sources that match
specific query types is feasible and
beneficial
Thank You

Building materialised views for linked data systems using microservices