The presentation held at the public defence of my doctoral thesis at the department of computer science of Aalto University, Espoo, Finland on 1st of September 2017.
2. The Real-Time Semantic Web?
• The availability of networked sensors is increasing rapidly
• Fast stream processing of large quantities of data is becoming a critical
competitive advantage
• Example application areas e.g. retail, investment market, fraud detection, emergency response,
health services and farming
• Highly distributed, loosely coupled solutions based on common standards are
required
• Challenge to proprietary platforms
• Semantic web standards RDF, SPARQL and OWL offer a good base for
interoperability
• How would they work for event processing?
6.9.2017
2
4. Events And Event Objects
• Event = Anything that happens, or that can be contemplated as
happening1)
• Event Object = A record of
observations in a system environment
6.9.2017
4
Measurements Observations
Event
Event Object
1) Luckham, D., Schulte, R.:
Event processing glossary – version 2.0 (Jul 2011)
5. Event Object Categories
6.9.2017
5
(Simple)
Event
(Simple)
Event Complex
Event
(Simple)
Event
CompositeEvent
Seppo
came in
Seppo, Mikko and
Esko are in.
Meeting
started in
time
(Simple)
Event
(Simple)
Event
Mikko
came in
(Simple)
Event
(Simple)
Event
Esko
came in
It is 9
a.m.
Summarizes,
represents, or
denotes a set
of other
events1)
1) Luckham, D., Schulte, R.:
Event processing glossary – version 2.0 (Jul 2011)
A derived event
created by combining
a set of other simple
or complex events in a
separable form1)
6. Events, States and Background
knowledge
• Both timepoint and interval semantics are used for events in
literature
• In this study single timepoint semantics have been chosen to support
immediate streaming of event objects
- Note: Does not preclude attaching multiple timestamps to an event, when time is measured
by different agents during processing
• “Any condition, which has a beginning and an end, either of
which may stretch to infinity” is hereby referred to as a “state”
• Statements without an associated time of validity are referred to
as “background knowledge”
6.9.2017
6
7. Example of Events, States and
Background knowledge
6.9.2017
7
friendOf a symmetricProperty
Jim friendOf Alan
Jim location1
Alan location1
Jim location2 Jim closeTo
Alan
Events States Background
Alan location2
Jim location3
Alan location3
Jim is nearby!
Alan is nearby!
Moving away
Simple Complex
P I
8. Database vs. Stream Processing
6.9.2017
8
2. Database
Traditional database
processing
1. Data
Customer
3.Query
4.Response
Stream Processing
Customer
1. Data
9. Data Stream Processing vs. Event Processing
6.9.2017
9
Event ProcessingData Stream Processing
Query
Input stream
Stream
window
Output
streamQueryQuery Event Processing
Network
Note: Event processing platforms typically
support data stream processing.
10. Complex Event Processing (“CEP”)
• “Computing that performs operation on complex events”1)
• Simple lower-layer events abstracted into more tangible higher-layer
events
• Recognition of event patterns often of importance
6.9.2017
10
1) Luckham, D., Schulte, R.:
Event processing glossary – version 2.0 (Jul 2011)
Event Producer
Event
Processing
Agent (EPA)
EPA
Event Consumer
Storage
Event Processing Network (EPN)
EPA
Event
Channel
Event
Channel
11. World-Wide Web vs. Semantic Web
6.9.2017
11
Semantic Web,
“The Web of Data”
World Wide Web,
“The Web of Documents”
http://www.nokia.com/en_int/
about-us
Networks
Innovation
We create the technology to
connect the world
Nokia is shaping the
technologies at the heart of
our connected world, to
transform the human
experience
Networks
Innovation
http://www.nokianka
upunki.fi/matkailu/fo
reign_visitors/welco
me_to_nokia/
Welcome to Nokia!
http://dbpedia.o
rg/resource/Cat
egory:Nokia
http://www.wikid
ata.org/entity/Q
7215352
owl:sameAs
rdfs:label
“Category:Nokia”
rdfs:label
“Nokia”
skos:broader http://dbpedia.org/r
esource/Category:
Electronics_compa
nies
rdfs:label
“Electronics
companies”
Short prefix
for ontology
URI.
rdfs:comment
“The property that
determines that two
given individuals
are equal.”
12. Resource Description Framework (RDF)
• Represent information on the Web
• Consists of three-element directed graphs a.k.a triples
6.9.2017
12
Ev
e1
ep:
EventObject
:loc1
geo:
lat
60.158776
geo:long
24.881490
ep:hasEventObjectSamplingTime
2014-01-07T09:18:21 geo:alt
122.37
<Eve1> a ep:EventObject;
ssn:Sensor :loc1;
geo:Point [ geo:lat 60.158776 ; geo:long 24.881490 ; ] ;
ep:hasEventObjectSamplingTime "2014-01-07T09:18:21"^^xsd:dateTime .
P I,
P III
Turtle
serialization
format
13. SELECT ?a
WHERE { ?x rdfs:subClassOf ?y .
?a a ?x }
SPARQL Protocol and RDF Query Language
• Query language for RDF data
6.9.2017
13
INSERT{ ?a a ?y }
WHERE { ?x rdfs:subClassOf ?y .
?a a ?x }
1. Match basic graph
pattern against the
data graph
P V
Rule known as “rdfs9”
in RDFS entailment,
“cax-sco” in OWL 2 RL
Animal
(?y)Rabbit
(?x)?a
2. Select and output
result
2. Add result to output
graph
• SPARQL Update (2013 in v. 1.1) added maintenance operations like
DELETE and INSERT
• Used in all included publications to build rules
15. INSTANS Research Platform Positioning
6.9.2017
15
Semantic
Web
No Semantic
Web
Static
Database
Data Stream
Processing
(Complex) Event
Processing
INSTANS2)
2) Incremental eNgine for STAnding SPARQL
EP-
SPARQL
CQELS
Esper
Jena
C-SPARQL
RDFox
Stardog
SQL
NoSQL WSO2
Aurora
STREAM
Publication I Publication
IV
Publication V
16. Proof of Concept
• Layered event processing application
“Close Friends”
• Collaboration of specification-compliant
SPARQL Query and Update
• Continuous, asynchronous processing using
a Rete-network
• Use of the local triple store as execution-
time memory
• Structured RDF events consisting of
multiple triples
• Event processing benefits in processing
event patterns
• Comparison with a data stream processing
platform
6.9.2017
16
P I
Mapping Event Processing Network
elements to RDF / SPARQL domain
Event
Producer
RDF input stream
to the current system
Event
Consumer
RDF output stream
to console
Storage
RDF triples in
local storage
EPA
(A Network of) SPARQL
queries and rules
17. Complex Event Processing to RDF
• Support for complex and composite events
• Definitions to separate header and payload elements
• Discussion on matching complex and composite events using
SPARQL
• Suggested methods, limitations
• Support for multiple timestamps of an event object
• Sampling time, time of entering data stream, arrival in event processing
system, expiration time
6.9.2017
17
P II
18. Validation of The Approach for CEP
• Examples of all types of
event processing agents
found in related
literature3), 4)
• Named RDF graphs
as event channels
6.9.2017
18
P III
3) Etzion, O., Niblett, P., & Luckham, D. (2010).
Event Processing in Action (p. 325). Manning
Publications.
4) Taylor K., Leidinger L. (ESWC 2011).
Ontology-driven complex event processing in
heterogeneous sensor networks.
Event
Processing
Agent
Filter Transformation
Pattern
Detect
Translate Aggregate Split Compose
Enrich Project Figure reproduced with publisher’s
permission.3)
19. Query Network Example
• Aggregate computes a multiple-in single-out function of
incoming events
• Typical examples count, min, max, average, sum
• Calculate number of events per hour
• Uses an auxiliary query to extract data utilized by three other queries
6.9.2017
19
P III
memhour,
counter,
eventhour
Initialize
Extract
Hour
Increase
Event Counter
Output Event
Counts
Reset
Memory
Events
In
Construct
Out
memhour,
counter
eventhour eventhour =
memhour
eventhour !=
memhour
Aggregate
20. Remote SPARQL Endpoint for
Enriching Events
• Remote access to
background knowledge
6.9.2017
20
P III
PREFIX omgeo: <http://www.ontotext.com/owlim/geo#>
PREFIX ff: <http://factforge.net/>
INSERT { GRAPH ?translated {
?event :locationName ?label .
?s ?p ?o } }
WHERE { GRAPH ?g {
FILTER ( strStarts(str(?g),concat(str(<>),"Input-Eve")) )
?s ?p ?o .
?event a ep:EventObject ;
geo:Point [ geo:lat ?lat; geo:long ?long ] .
SERVICE <http://factforge.net/sparql> {
?location omgeo:nearby(?lat ?long "1km");
ff:preferredLabel ?label }
BIND (IRI(concat("Translated-",strAfter(str(?event),str(<>))))
AS ?newgraph) } }
5) Linking Open Data cloud diagram 2017, by Andrejs Abele,
John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard
Cyganiak. http://lod-cloud.net/
5)
Enrich
21. Turtle vs. TriG
6.9.2017
21
Block 2
Block 1 Block 1
• RDF has challenges with event object boundaries6)
– Turtle grammar rule #6 specifies “triples”, but the order of triples impacts the composition of the block
6) Keskisärkkä, R., Blomqvist, E.: Event Object Boundaries in RDF Streams: A Position Paper. In:
ISWC 2013 Workshop: Proceedings of OrdRing 2013 - 2nd International Workshop on Ordering
and Reasoning. CEUR workshop proceedings, Sydney, Australia (October 2013)
• TriG specifies the Turtle
format for sending datasets
– Each event in a separate
graph
_:5 geo:lat 60.158775;
geo:long 24.88149 .
<Eve1> rdf:type ep:EventObject;
ssn:Sensor :loc1;
geo:Point _:5;
ep:hasEventObjectSamplingTime
"2014-01-07T09:18:21"^^xsd:dateTime .
<Eve1> rdf:type ep:EventObject;
ssn:Sensor :loc1;
geo:Point _:5;
ep:hasEventObjectSamplingTime
"2014-01-07T09:18:21"^^xsd:dateTime .
_:5 geo:lat 60.158775;
geo:long 24.88149 .
<Eve1> { _:5 geo:lat 60.158775;
geo:long 24.88149 .
<Eve1> rdf:type ep:EventObject;
ssn:Sensor :loc1;
geo:Point _:5;
ep:hasEventObjectSamplingTime
"2014-01-07T09:18:21"^^xsd:dateTime . }
vs.
TriG
P III
23. Performance Comparison with
A Practical Example
• Counterfeit and theft detection in pharmaceutical manufacturing
• INSTANS compared with Esper event processing platform
• New execution environment coded for Esper in Scala
• RDF events converted to XML
• Event processing application separately coded for INSTANS in SPARQL
and Esper in EPL (“Event Processing Language”)
• Identical results verified
6.9.2017
23
P
IV
24. Example Segment of Task
• All products are identified
with an Electronic Product
Code (EPC)
• EPC are scanned during
Commissioning and
Packing operations
• If a commissioned EPC is
not found in packing within
a specified delay, it should
be reported stolen
• If a non-commissioned
EPC is found in packing, it
should be reported as
counterfeit
6.9.2017
24
P
IV
c(1)
Commissioning
<reader102> c(cp)
p(1)
Packing
<reader103> p(2)
Δtcp
EPC1 EPCec
c(2)
Δt
c
p(ps)
Δt
p
26. Qualitative Comparison
• Move from RDF to XML collapses all semantic web benefits
• No connection to globally accessible ontologies
• No semantic web tools can operate on the stream
• No reasoning over an XML stream
• Logistics chain scenario
• With semantic web tools the manufacturer could create a graph or dataset of shipped
items for the warehouse
• Using only a link the warehouse could access a federated SPARQL endpoint to
compare received items with the shipping manifest
• Without these tools a dedicated solution with more task-specific coding would be
needed
6.9.2017
26
P
IV
27. Support for Reasoning
• Generate new knowledge out of facts and rules
• Survey of rule-based entailment regimes for RDF
data
• RDF, RDFS, ρdf, D*, P-Entailment, OWL 2 RL,
SWCLOS2
• Implementation on INSTANS using SPARQL
Query and Update
• Altogether 100 unique rules and 21
unsatisfiability conditions
• Compliance-tested using entailment tests of the
SPARQL 1.1 Test Suite
• Performance tested using the Lehigh University
Benchmark (LUBM)
6.9.2017
27
P V
ρdf
RDF(S)
OWL2RL
SWCLOS2
D*
8
9
2
48
21
6
18
3
P-ent.2
3
28. Customisation of Rules
• Rules are written in the same language (SPARQL) as the
queries, and can be packaged together
• Full control over the version of rules used with a particular query
• A complete entailment regime is rarely needed
• Current compliance test set only covers 17-47% of the rules
(depending on the regime)
• No LUBM query requires more than three rules, a total of eight
rules passes all 14 LUBM queries
- Some require rules only present in OWL 2 RL, having 58 rules
6.9.2017
28
P V
29. Event-Based Memory Handling
• Infinite streams of data eventually fill any memory or
database
• Resources need to be released, when no longer required
• Publications I and III demonstrate explicit “cleanup-rules”, which
compare timestamps and DELETE any events except the latest one
• INSTANS provides an operational policy, which removes incoming
events after all dependencies have been computed to the Rete-
network
- Not applicable to multi-input EPA:s (Publication III, EPA 7)
• Publication V demonstrates an approach, where background
knowledge and related materialisations are directed to a named
graph and each event and the related materialisations are deleted
after processing
6.9.2017
29
P V
1.
Static
Static
2.
Static
Event
3.
Static
End
31. Concepts Developed During Study
• RDF definitions for
• Composite and complex events
• Header and payload
• Timestamps for different phases of processing path
• Structured, heterogeneous events streamed as datasets using TriG format
• (Complex) event processing using networks of asynchronously operating,
specification-compliant SPARQL Update rules
• Mapping of all event processing network elements to SPARQL building blocks
• Operations on nested composite and complex events
• Recognition of event patterns
• Computation of aggregate values
• Configurable approaches to memory handling
• Materialisation-based reasoning
6.9.2017
31
32. Event Processing Using RDF and
SPARQL
• Tested with
• Structured, heterogeneous events
• Asynchronously operating networks of SPARQL queries and update rules
• Different types of event processing agents presented in literature
• Verified using an executable platform
• All tests with explicit sampling time in events
- Query answers are deterministic and repeatable
• Identical results with comparison platforms
• Code, queries, data and documentation openly available for independent
verifications
6.9.2017
32
33. Performance of INSTANS Platform
• Average notification delay 12ms*) (P I)
• Considerably faster than a feasible window repetition rate in typical data stream
processing systems
• Real-life target of 2.83 EPC/s exceeded 950 times while processing
commissioning, packing and shipping on a regular laptop (P IV)
• Average speeds of up to 1,810 events / second (P III, EPA 6) and
21,195 triples / second (P V, LUBM Q1) demonstrated
• Using event-based memory handling and optimised sets of rules all
LUBM queries were completed up to a dataset of 100 universities
(13,405,383 triples)
6.9.2017
33
*) Result obtained with older laptop
HW using 1st generation INSTANS.
34. Performance Improvement
Suggestions for INSTANS
• Support for multi-core processor architectures
• Current Common Lisp implementation only using a single core
• Modular Rete-engine
• The whole EPN of P III was tested in one Rete-engine
• Running each EPA in a separate module would reduce complexity and unnecessary verifications
• Parallel HW
• Indexing of comparisons
• P IV, experiment 4, expiration of up to one million EPC codes is checked every time the time
moves forward
• Major improvements achievable by indexing the comparisons, preliminarily tested as 86 times
faster for Exp4 100k EPC case (from 20 EPC/s to 1,736 EPC/s)
6.9.2017
34
35. Identified SPARQL Enhancements for
Event Processing
• Allow combinations of DELETE + INSERT + ( SELECT or CONSTRUCT )
• SPARQL 1.1 grammar does not allow combined Query and Update
• Complexity reduction for cases where both result output and data processing are needed
• Rule priorities to specify execution order
• Wild card matching for GRAPH statements
• Reduce steps for matching individually named events on event channels
• Extensions for geographical coordinates and time arithmetics
• Dataset output capability for CONSTRUCT to enable TriG output
(syntax identical to INSERT)
• Extension for converting xsd:dateTime timestamps into
integer values
6.9.2017
35
}Implemented
In INSTANS
36. Challenges of The Approach
• Implementation efficiency
• No a priori assumptions about the structure of an incoming event can be
made
- Every event could be different, but this flexibility is typically unnecessary and also complex
for writing queries
- In more rigid approaches event object schemas are pre-defined, and event objects can be
directly assigned to programming-language objects
• Management of concepts between ontologies is unnecessary in closed, well-
defined environments
• Unnecessary complexity for closed environments requiring only
a few event types
6.9.2017
36
37. Conclusions
• RDF and SPARQL offer a flexible framework for layered
processing of structured, heterogeneous events
• Best suited for distributed, loosely coupled environments
- Support for compatibility between ontologies, multiple event types, reasoning
• Simple, well-defined stream processing tasks in closed environments
requiring maximum performance can be executed more optimally using
other means
• Performance measured for a variety of cases on a test platform
• Sufficient for many real-world stream processing tasks
6.9.2017
37