This slide deck focuses on the deployment architecture for distributed stream processing, WSO2 Stream Processor’s high-level architecture, and concepts for distributed Siddhi applications.
3. WSO2 Stream Processor (WSO2 SP)
An open source, cloud native analytics product
optimized to create real-time, actionable insights for
agile digital businesses.
3
8. Why Distributed Stream Processing
• Availability of data
• Need for timely business insights
Creates requirements for
• High throughput
• Low latency
8
9. WSO2 SP Distributed Deployment
• Exactly-once processing
• Fault tolerance
• Highly scalable
• No back pressure
• Distributed development configurations via annotations
• Pluggable distribution options (K8, YARN, etc.)
9
10. Complexities Over Single Node
• Communication across components
• Fault tolerance of components
• Message semantics
• Message ordering
10
19. Sample Standalone Siddhi Application@App:name('Energy-Alert-App')
@App:description('Energy consumption and anomaly detection')
@source(type = 'http', topic = 'device-power', @map(type = 'json'))
define stream DevicePowerStream (type string, deviceID string, power int, roomID string);
@sink(type = 'email', to = '{{autorityContactEmail}}', username = 'john', address = 'john@gmail.com', password =
'test', subject = 'High power consumption of {{deviceID}}', @map(type = 'text', @payload('Device ID: {{deviceID}} of
room : {{roomID}} is consuming {{finalPower}}kW/h. ')))
define stream AlertStream (deviceID string, roomID string, initialPower double, finalPower double,
autorityContactEmail string);
@info(name = 'monitered-filter')
from DevicePowerStream[type == 'monitored']
select deviceID, power, roomID
insert current events into MonitoredDevicesPowerStream;
@info(name = 'power-increase-pattern')
partition with (deviceID of MonitoredDevicesPowerStream)
begin
@info(name = 'avg-calculator')
from MonitoredDevicesPowerStream#window.time(2 min)
select deviceID, avg(power) as avgPower, roomID
insert current events into #AvgPowerStream;
@info(name = 'power-increase-detector')
from every e1 = #AvgPowerStream -> e2 = #AvgPowerStream[(e1.avgPower + 5) <= avgPower] within 10 min
select e1.deviceID as deviceID, e1.avgPower as initialPower, e2.avgPower as finalPower, e1.roomID
insert current events into RisingPowerStream;
end;
19
20. Sample Standalone Siddhi Application Contd
@info(name = 'power-range-filter')
from RisingPowerStream[finalPower > 100]
select deviceID, roomID, initialPower, finalPower, 'no-reply@powermanagement.com' as autorityContactEmail
insert current events into AlertStream;
@info(name = 'internal-filter')
from DevicePowerStream[type == 'internal']
select deviceID, power
insert current events into InternalDevicesPowerStream;
20
21. Basic Concepts
• Execution Group
– Collection of Siddhi queries
– Single execution unit
• Parallelism
– Number of parallel instances
21
22. Annotations
• ExecGroup
– Create execution groups
• Parallel
– Provide parallelism to execution groups
– Provide parallelism to sources
• TransportChannelCreationEnabled
– Management option to control creation of channels
22
23. Sample Distributed Siddhi Application@App:name('Energy-Alert-App')
@App:description('Energy consumption and anomaly detection')
@source(type = 'http', topic = 'device-power', @map(type = 'json'))
define stream DevicePowerStream (type string, deviceID string, power int, roomID string);
@sink(type = 'email', to = '{{autorityContactEmail}}', username = 'john', address = 'john@gmail.com', password =
'test', subject = 'High power consumption of {{deviceID}}', @map(type = 'text', @payload('Device ID: {{deviceID}} of
room : {{roomID}} power is consuming {{finalPower}}kW/h. ')))
define stream AlertStream (deviceID string, roomID string, initialPower double, finalPower double,
autorityContactEmail string);
@info(name = 'monitered-filter')@dist(execGroup='001')
from DevicePowerStream[type == 'monitored']
select deviceID, power, roomID
insert current events into MonitoredDevicesPowerStream;
@info(name = 'power-increase-pattern')@dist(parallel='2', execGroup='002')
partition with (deviceID of MonitoredDevicesPowerStream)
begin
@info(name = 'avg-calculator')
from MonitoredDevicesPowerStream#window.time(2 min)
select deviceID, avg(power) as avgPower, roomID
insert current events into #AvgPowerStream;
@info(name = 'power-increase-detector')
from every e1 = #AvgPowerStream -> e2 = #AvgPowerStream[(e1.avgPower + 5) <= avgPower] within 10 min
select e1.deviceID as deviceID, e1.avgPower as initialPower, e2.avgPower as finalPower, e1.roomID
insert current events into RisingPowerStream;
end;
23
24. Sample Distributed Siddhi Application Contd..
@info(name = 'power-range-filter')@dist(parallel='2', execGroup='003')
from RisingPowerStream[finalPower > 100]
select deviceID, roomID, initialPower, finalPower, 'no-reply@powermanagement.com' as autorityContactEmail
insert current events into AlertStream;
@info(name = 'internal-filter')@dist(execGroup='004')
from DevicePowerStream[type == 'internal']
select deviceID, power
insert current events into InternaltDevicesPowerStream;
24