The document presents spChains, a declarative framework for processing streaming sensor data in pervasive applications. SpChains uses a library of predefined stream processing blocks (spBlocks) that can be combined into processing chains (spChains) to enable computations on streaming data in real-time. The framework has been implemented in Java and scales to process over 200,000 events per second. It has been used successfully in several pervasive computing installations to monitor environmental data.
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications
1. Politecnico di Torino
Dip. Automatica e Informatica
Torino, Italy
The 3rd International Conference on
Ambient Systems, Networks and Technologies
August 27-29, 2012, Niagara Falls, Ontario, Canada
http://elite.polito.it
spChains:
A Declarative Framework for Data Stream
Processing in Pervasive Applications
Dario Bonino, Fulvio Corno
2. Goals
Enable real-time ambient & sensor data processing
Allow AmI designers to easily specify required
computations
Provide an extensible open source processing library
2 ANT’2012, Niagara Falls, Canada spChains
3. Outline
Motivation and Background
Stream processing
spChains Framework
Use cases
Conclusions
3 ANT’2012, Niagara Falls, Canada spChains
4. Motivation
Ambient Intelligence Systems
100’s or 1,000’s of sensors
Different physical quantities (ºC, %H2O, kW, kWh, …)
Sampling frequencies from seconds to minutes
Huge stream of data being generated
Storage and retrieval
On-line processing
Off-line processing
Analytics
4 ANT’2012, Niagara Falls, Canada spChains
5. On-line processing: Applications
Data Decimation (from kHz to mHz)
Aggregation (over time, over space, over sensor types)
Averaging
Feeding User Displays and Dashboards
Computing up-to-date and user-meaningful information
Monitoring and Alerting
Checking Thresholds
Generating Alert messages
Virtual Sensors
Computing derivative quantities
5 ANT’2012, Niagara Falls, Canada spChains
6. Requirements
Input: up to 10,000-100,000 events/second
Data: real-valued quantities, explicit units of measure
Output: real-valued or Boolean, often at much lower
frequency
Computation: custom-defined depending on the
application requirements
Operators: reusable standard temporal operations
applicable to data streams
Usability: should not require database expert to define
computations, domain experts must be autonomous
6 ANT’2012, Niagara Falls, Canada spChains
7. Technology scouting
Standard Relational DBMS Custom programming
Good for storage Perfect fit with application
Not efficient for requirements
computations Very expensive to
Rely on central servers customize
NoSQL approaches Stream Processing
Great for storage No storage
May do computations, Excellent for computations
require custom Requires custom expertise
programming and expertise
Rely on central (or cloud)
servers
7 ANT’2012, Niagara Falls, Canada spChains
8. Stream Processing
(or Complex Event Processing, CEP)
Event processing: tracking and analyzing streams of data
«events», and deriving a conclusion from them
Defines a set of (fixed) queries
Event streams are analyzed in real time (often with in-
memory processing) according to the programmed queries
Guarantees fast and scalable processing
Increasingly adopted in different domains: Business Process
Management, Recommender Systems, Financial Services, Time
Series, …
Several tools available (commercial and open source)
Specific skills needed to write efficient queries, in tool-
dependent languages
8 ANT’2012, Niagara Falls, Canada spChains
9. Stream Processing
(or Complex Event Processing, CEP)
Event processing: tracking and analyzing streams of data
«events», and deriving a conclusion from them
insert into RealEvent(src, streamName, value,
Defines a set of (fixed) queries
unitOfMeasure) select ‘‘Average’’,
‘‘Average-out’’, avg(value) as value,
Event streams are analyzed in real(streamName=’’M1’’). in-
unitOfMeasure from realEvent time (often with
memorywin:time_batch(‘‘1h’’) to the programmed queries
processing) according
group by src, streamName, unitOfMeasure;
Guarantees fast and scalable processing
insert into BooleanEvent(src, streamName,
booleanValue) select ‘‘Threshold’’,
Increasingly adopted in different domains: Business Process
‘‘Threshold-out’’ as streamName, true as value from pattern
[every (oldSample=RealEvent(
Management, Recommender Systems, Financial Services, Time
streamName=‘‘Average-out’’,
Series, …MeasureEventComparator.compareToMeasure(oldSample,‘‘1kW’’,
EventComparisonEnum.LESS_THAN_OR_EQUAL)) ->
Several tools available (commercial and open source)
newSample=RealEvent(streamName=oldSample.streamName,
MeasureEventComparator.compareToMeasure(newSample,‘‘1kW’’,
Specific skills needed to write efficient queries, in tool-
EventComparisonEnum.GREATER_THAN)))].win:length(2);
dependent languages
9 ANT’2012, Niagara Falls, Canada spChains
10. Proposed approach (1)
Stream Processing for event data processing in real time
(Extensible) Library of predefined operators (spBlocks)
Declarative framework (spChains) to express the
required computations
Each Computation = Stream Processing Chain
Chain = Sequence of Stream Processing Blocks
Block = predefined operator, configured with parameters
10 ANT’2012, Niagara Falls, Canada spChains
11. Proposed approach (2)
The set of spChains is described as a simple XML file
All chains are automatically mapped to Stream
Processing queries
<spXML:blocks> insert into RealEvent(src, streamName, value,
unitOfMeasure) select ‘‘Average’’,
<spXML:block id="Avg1“ ‘‘Average-out’’, avg(value) as value,
function="AVERAGE"> unitOfMeasure from realEvent
<spXML:param name="window" value="1“ (streamName=’’M1’’).
win:time_batch(‘‘1h’’)
unitOfMeasure="h"/> group by src, streamName, unitOfMeasure;
<spXML:param name="mode“ insert into BooleanEvent(src, streamName,
value="batch"/> booleanValue) select ‘‘Threshold’’,
‘‘Threshold-out’’ as streamName, true as value
</spXML:block> from pattern [every (oldSample=RealEvent(
streamName=‘‘Average-out’’,
<spXML:block id="Th1“ MeasureEventComparator.compareToMeasure(oldSamp
le,‘‘1kW’’,
function="THRESHOLD"> EventComparisonEnum.LESS_THAN_OR_EQUAL)) ->
<spXML:param name="threshold“ newSample=RealEvent(streamName=oldSample.stream
value="1" unitOfMeasure="kW"/> Name,
MeasureEventComparator.compareToMeasure(newSamp
</spXML:block> le,‘‘1kW’’,
EventComparisonEnum.GREATER_THAN)))].win:length
</spXML:blocks> (2);
11 ANT’2012, Niagara Falls, Canada spChains
12. spChains Framework
spBlocks
Stream Pattern Match / Alerts
Processing
Block
Pervasive
Event Sources
application
Event Drains
Environmental Stream Aggregate / Computed (s) Final Users
Data Processing Measures
Chains
Pervasive/Ubiquitous Communication
Infrastructure
Chain Definition
12 ANT’2012, Niagara Falls, Canada spChains
15. Examples of spChains
<spXML:blockid = "Avg1" function = "AVERAGE">
<spXML:param name = "window"
value = "1"
unitOfMeasure = "h" / >
<spXML:param name = "mode"
value = "batch" />
</spXML:block>
15 ANT’2012, Niagara Falls, Canada spChains
16. Implementation
Java spChains library (Apache v2.0 license)
Core library http://elite.polito.it/spchains
Esper bindings
Basic spBlock library
Scales up to 200 k events/sec
Already in use
3 different data centers, running on embedded PCs
Monitoring environment, electrical power consumption,
thermal flows (heating and cooling), polled by means of the
Dog2.x multiprotocol gateway
Computed quantity are “pushed” to Web Service collectors
Over 3 months of uptime, no issues found
16 ANT’2012, Niagara Falls, Canada spChains
17. Conclusions
Complex computations in
the field and in real time
Efficient and easy to
integrate
Lowered the barrier to
adoption of Stream
Processing
Future work http://elite.polito.it
User interface http://elite.polito.it/spchains
Large-scale installations fulvio.corno@polito.it
dario.bonino@polito.it
17 ANT’2012, Niagara Falls, Canada spChains