Gaining actionable insights in real time enables organizations to grab opportunities and omit threats. Sensing the world, detecting actionable insights, and acting upon them has now become far easier than ever with the advancements of streaming SQL. Below are the topics discussed in this slide.
- Building stream processing applications using streaming SQL
- Deploying and monitoring streaming applications
- Scaling streaming applications
- Building domain specific business UIs
- Visualizing stream processing outputs via dashboards
3. Streaming Application
An Application that provides
analytical operators to
orchestrate data flow, calculate
analytics, and detect patterns on
event data from multiple,
disparate live data sources to
allow developers to build
applications that sense, think, and
act in real-time.
- Forrester
8. ● Lightweight, lean & cloud native
● Easy to learn Streaming SQL
● High performance analytics with just 2 nodes (HA)
● Native support for streaming Machine Learning
● Long term aggregations without batch analytics
● Highly scalable deployment with exactly-once processing
● Tools for development and monitoring
● Tools for business users to write their own rules
Overview of WSO2 Stream Processor
9. WSO2 Stream Processor
• Editor/Studio - Developer environment
• Worker/Resource - Resource node
• Dashboard
– Portal - Business dashboard
– Business Rules Manager - Management
console for business users
– Status Dashboard - Monitoring dashboard
• Manager - Job manager for distributed
processing
Profiles
16. Streaming Processing
With WSO2 Stream Processor
Siddhi Streaming App
- Process events in streaming manner
- Isolated unit with set of queries, input and
output streams
- SQL Like Query Language
from Sales#window.time(1 hour)
select region, brand, avg(quantity) as AvgQuantity
group by region, brand
insert into LastHourSales ;
Stream
Processor
Siddhi App
{ Siddhi }
Input Streams Output Streams
Filter Aggregate
JoinTransform
Pattern
Siddhi Extensions
20. Stream Processor Studio
• Writing Siddhi applications
– Syntax highlighting
– Auto completion
– Error reporting
– Documentation support
• Debugging Siddhi apps
– Inspect events
– Inspect query states
Developer Environment
21. Stream Processor Studio
• Testing Siddhi apps via Event Simulation
– Send Event by Event
– Simulate Random Data
– Simulate via CSV file
– Simulate from Database
• Support for running and testing on Python
– as PySiddhi
• IDE Tools
– Intellij Idea Plugin
Developer Environment
29. Use Case 1 :
Production at each factory should not
reduce below 5000 units per hour !
30. 1.1 Monitor and Identify events that
indicate low production
31. Total Amount Produced
define stream SweetProductionStream (name string, amount double);
from SweetProductionStream
select sum(amount) as hourlyTotal
insert into LowProducitonAlertStream ;
Calculate total amount
produced forever
32. Total Amount Produced in the Last Hour
define stream SweetProductionStream (name string, amount double);
from SweetProductionStream#window.time(1 hour)
select sum(amount) as hourlyTotal
insert into LowProducitonAlertStream ; Calculate total amount
produced for last hour
33. Amount Produced Per Product
define stream SweetProductionStream (name string, amount double);
from SweetProductionStream#window.time(1 hour)
select name, sum(amount) as hourlyTotal
group by name
insert into LowProducitonAlertStream ;
Calculate total amount
produced for each product
34. Identify Low Production Rates
define stream SweetProductionStream (name string, amount double);
from SweetProductionStream#window.time(1 hour)
select name, sum(amount) as hourlyTotal
group by name
having hourlyTotal < 5000
insert into LowProducitonAlertStream ;
Filter events where produced
amount is less than 5000
35. Consider Working Hours for Calculation
define stream SweetProductionStream (name string, amount double);
from SweetProductionStream#window.time(1 hour)
select name, sum(amount) as hourlyTotal,
time:extract(currentTimeMillis(), 'HOUR') as currentHour
group by name
having hourlyTotal < 5000 and
currentHour > 9 and currentHour < 17
insert into LowProducitonAlertStream ;
Use functions to extract the
hour of event arrival time
36. Rate Limit Low Production Alerts
define stream SweetProductionStream (name string, amount double);
from SweetProductionStream#window.time(1 hour)
select name, sum(amount) as hourlyTotal,
time:extract(currentTimeMillis(), 'HOUR') as currentHour
group by name
having hourlyTotal < 5000 and
currentHour > 9 and currentHour < 17
output last every 15 min
insert into LowProducitonAlertStream ;
Send alerts every 15 minutes
38. Send Alerts via Email
@sink(type =‘email’, to=‘manager@sf.com’,
subject=‘Low Production of {{name}}!’,
@map (type=‘text’, @payload(“““
Hi Manager,
Production of {{name}} has gone down to {{hourlyTotal}}
in last hour!
From Sweet Factory”””)))
define stream LowProducitonAlertStream (name string, hourlyTotal double,
currentHour int);
Context sensitive
email
39. Use Case 2 :
Raw material storage at the factories
should be closely monitored
40. 2.1 Store raw material shipment
details in a data store
41. Data Store Integration
● Allow to perform operations with the data store while
processing the events on the fly.
Store, Retrieve, Remove and Modify
● Provides a REST endpoint to Query Data Store
● Query Optimizations using Primary and Indexing Keys
● Search ● Insert ● Delete ● Update ● Insert/Update
42. Store Raw Material Info
@source(type = ‘http’, @map(type = ‘json’))
define stream RawMaterialStream(name string, amount double);
define table LatestShipmentDetailTable (name string, amount double);
In-memory table to
store last shipment of raw
material
43. Store Data
With Primary Key & Index
@source(type = ‘http’, @map(type = ‘json’))
define stream RawMaterialStream(name string, amount double);
@primaryKey(‘name’)
@Index(‘amount’)
define table LatestShipmentDetailTable (name string, amount double);
Support for Primary Key and
Index for fast data access
44. Store in External Data Store
@source(type = ‘http’, @map(type = ‘json’))
define stream RawMaterialStream(name string, amount double);
@store(type=‘rdbms’, … )
@primaryKey(‘name’)
@Index(‘amount’)
define table LatestShipmentDetailTable (name string, amount double);
Table backed by
RDBMS, MongoDB, HBase, Cassandra, Solr,
Hazelcast, etc.
45. Insert Events into Table
@source(type = ‘http’, @map(type = ‘json’))
define stream RawMaterialStream(name string, amount double);
@store(type=‘rdbms’, … )
@primaryKey(‘name’)
@Index(‘amount’)
define table LatestShipmentDetailTable (name string, amount double);
from RawMaterialStream
select name, amount
insert into LatestShipmentDetailTable ;
Insert into table from stream
46. Update-Insert Events into Table
@source(type = ‘http’, @map(type = ‘json’))
define stream RawMaterialStream(name string, amount double);
@store(type=‘rdbms’, … )
@primaryKey(‘name’)
@Index(‘amount’)
define table LatestShipmentDetailTable (name string, amount double);
from RawMaterialStream
select name, amount
update or insert into LatestShipmentDetailTable
on LatestShipmentDetailTable.name == name ;
Update or Insert into
the table with stream
48. Streaming Data Summarization
Aggregations Over Long Time Periods
• Incremental Aggregation for every
– Seconds, Minutes, Hours, Days, …, Year
• Support for out-of-order event arrival
• Fast data retrieval from memory and disk for
real time update
Current Min
Current Hour
Sec
Min
Hour
0 - 1 - 5 ...
- 1
- 2 - 3 - 4 - 64 - 65 ...
- 2
- 124
49. Define Aggregation
define stream RawMaterialStream(name string, amount double);
define aggregation RawMaterialAggregation
from RawMaterialStream
select name, sum(amount) as totalAmount, avg(amount) as
averageAmount
group by name
aggregate every min … year
Calculate total and average amount
for each
min upto year
50. Define Aggregation ...
define stream RawMaterialStream(name string, amount double);
@store(type=‘rdbms’, … )
define aggregation RawMaterialAggregation
from RawMaterialStream
select name, sum(amount) as totalAmount, avg(amount) as
averageAmount
group by name
aggregate every min … year
Like Table Store Aggregation in
RDBMS, MongoDB, HBase, Cassandra, Solr, Hazelcast, etc.
51. Data Retrieval API
• Can perform data search on Data
Stores or pre-defined Aggregations.
• Supports both REST and Java APIs
52. Retrieve Summarized Data
Perform REST Call
curl -X POST https://localhost:9443/stores/query
-H "content-type: application/json"
-u "admin:admin"
-d '{"appName" : "Sweet-Factory-Analytics-3",
"query" : "from RawMaterialAggregation
on name == 'caramel'
within '2018-**-** **:**:**'
per 'minutes'
select name, totalAmount, averageAmount ;"
}'
54. Portal
Dashboard & Widgets for Business Users
• Generate dashboard and widgets
• Fine grained permissions
– Dashboard level
– Widget level
– Data level
• Localization support
• Inter widget communication
• Shareable dashboards with widget state persistence
56. Use Case 3 :
Warehouse managers should be
alerted if there will be a shortage of
raw material for future production
cycles
57. 3.1 Check if the current raw material
input rate is enough for production
58. Join Raw Material with Production Input
define stream RawMaterialStream (name string, amount double);
define stream ProductionInputStream (name string, amount double);
from ProductionInputStream#window.time(1 hour) as p
join RawMaterialStream#window.time(1 hour) as r
on r.name == p.name
select r.name, sum(r.amount) as totalRawMaterial,
sum(p.amount) as totalConsumption
group by r.name
having (totalConsumption - totalRawMaterial)*100.0 / totalRawMaterial > 5
insert into RawMaterialInputRateAlertStream ;
Identify 5% increase by
joining two streams
59. Join with External Window
define window RawMaterialWindow (name string, amount double) time(1 hour);
define stream ProductionInputStream (name string, amount double);
from ProductionInputStream#window.time(1 hour) as p
join RawMaterialWindow as r
on r.name == p.name
select r.name, sum(r.amount) as totalRawMaterial,
sum(p.amount) as totalConsumption
group by r.name
having (totalConsumption - totalRawMaterial)*100.0 / totalRawMaterial > 5
insert into RawMaterialInputRateAlertStream ;
Joining a stream with a
defined window
62. Using Pre-built PMML Model
define stream ProductionInputStream
(name string, currentHourAmmount double,
previousHourAmount double);
from ProductionInputStream#pmml:predict(‘file/model.pmml’, name,
previousHourAmount, currentHourAmmount )
select name, nextHourAmount, getEventTime() as currentTime
insert into PredictedProdInputStream ;
Predict required raw materials using a
static model
63. Online Machine Learning
define stream ProductionInputStream (currentHourAmount double,
previousHourAmount double );
define stream ProductionInputResultsStream ( currentHourAmount double,
previousHourAmount double, nextHourAmount double );
from ProductionInputResultsStream#streamingml:updateAMRulesRegressor
(currentHourAmount, previousHourAmount, nextHourAmount )
select *
insert into TrainOutputStream;
from ProductionInputStream#streamingml:AMRulesRegressor
(currentHourAmount, previousHourAmount )
select currentHourAmount, previousHourAmount, prediction as nextHourAmount
insert into PredictedProdInputStream;
Predict required raw materials while
learning in a streaming manner.
64. 3.3 Check predicted raw material
availability with warehouse stocks
and alert if insufficient
65. Predict & Alert
define window RawMaterialWindow (name string, amount double) time(1 hour);
define stream ProductionInputResultsStream ( currentHourAmount double,
previousHourAmount double, nextHourAmount double );
from ProductionInputResultsStream#streamingml:updateAMRulesRegressor
(currentHourAmount, previousHourAmount, nextHourAmount )
select *
insert into TrainOutputStream;
from PredictedProdInputStream as p join RawMaterialWindow as r
on r.name == p.name
select r.name, p.predictedAmount, sum(r.amount) as totalRawMaterial
having totalRawMaterial < totalConsumption
insert into RawMaterialInputRateAlertStream ;
66. Use Case 4 :
Factory Managers should be alerted if
production does not start within 15
min from raw material arrival
67. Non-occurrence through Patterns
define stream RawMaterialStream (name string, amount double);
define stream ProductionInputStream (name string, amount double);
from every (e1 = RawMaterialStream)
-> not ProductionInputStream[name == e1.name and
amount == e1.amount] for 15 min
select e1.name, e1.amount
insert into ProductionStartDelayed ; Identity non-occurrence Pattern
68. Use Case 5 :
Alert factory managers if rate of
production continuously decreases for
X time period
70. Identify Trends
define stream SweetProductionStream(name string, amount double);
from SweetProductionStream#window.timeBatch(1 min)
select name, sum(amount) as amount, currentTimeMillis() as timestamp
group by name
insert into LastMinProdStream;
partition with (name of LastMinProdStream)
begin
from every e1=LastMinProdStream,
e2=LastMinProdStream[timestamp - e1.timestamp < 10 * 60000
and e1.amount > amount]*,
e3=LastMinProdStream[timestamp - e1.timestamp > 10 * 60000
and e2[last].amount > amount]
select e1.name, e1.amount as initalamout, e3.amount as finalAmount
insert into ContinousProdReductionStream ;
end;
Identify decreasing trends
for 10 mins
72. Business Rules Manager
• Hide Siddhi app creation complexity to business users
• Build rules via a simple web-based UI
– From scratch
Build custom filters to event streams
– From a template
Build rules from developer created template
Dashboard for Rule Management
75. Template as Business Rules
define stream SweetProductionStream(...);
…
partition by (name of LastMinProdStream)
begin
from every e1=LastMinProdStream,
e2=LastMinProdStream[timestamp - e1.timestamp < $TimeInMin * 60000
and e1.amount > amount]*,
e3=LastMinProdStream[timestamp - e1.timestamp > $TimeInMin * 60000
and e2[last].amount > amount]
select e1.name, e1.amount as initalamout, e3.amount as finalAmount
insert into ContinousProdReductionStream ;
end;
Identify decreasing trend for
X mins
77. Minimum HA with 2 Nodes
Stream Processor
Stream Processor
• High Performance
– Process around 100k
events/sec
– Just 2 nodes
– While most others need 5+
• Zero Downtime
• Zero Event Loss
• Simple deployment with RDBMS
– No zookeeper, kafka, etc
• Multi Data Center Support
Event Sources
Siddhi App
Siddhi App
Siddhi App
Siddhi App
Siddhi App
Siddhi App
Event
Store
78. • Exactly-once processing
• Fault tolerance
• Highly scalable
• No back pressure
• Distributed development configurations via annotations
• Pluggable distribution options (YARN, K8, etc.)
Distributed Deployment
92. • Finance and Banking
• Retail
• Location
• Operational
• Smart Energy
• Social Media
• System and Network
• Healthcare
Available Options
93. Running Siddhi on the Edge
● Lightweight and lean
● OOTB support for consuming events from Android
sensors
● Support for Python
○ https://github.com/wso2/PySiddhi/
In Android & Raspberry Pi