This presentation explains how to enable digital transformation through streaming analytics and how easily streaming applications can be implemented. We look at the following:
- The Architecture of WSO2 Stream Processor
- Understanding streaming constructs
- Patterns of processing data in real-time, incrementally and with intelligence
- Applying patterns when building streaming apps
- Deployment patterns
2. Goal
● Business scenarios for building streaming apps
● Why streaming patterns
● 11 patterns of building streaming apps
● When to use streaming patterns
● How WSO2 Stream Processor can help you to build
streaming apps
● How to develop, deploy and monitor streaming apps
3. Why Streaming?
Real-time
Near
Real-time
Offline
Constant low
milliseconds &
under
Low milliseconds
to
seconds
10s seconds
to
minutes
● A stream is series of events
● Almost all new data is streaming
● Detects conditions quickly
Image Source : https://www.flickr.com/photos/plusbeautumeurs/33307049175
4. Why Streaming
Apps?
● Identify perishable insights
● Continuous integration
● Orchestration of business
processes
● Embedded execution of code
● Sense, think, and act in real
time
- Forrester
5. 1. Event-driven data integration
2. Real-time ETL
3. Generating event streams from passive data
4. Streaming data routing
5. Notification management
6. Real-time decision making
7. KPI monitoring
8. Citizens integration on streaming data
9. Dashboarding and reporting
Business Scenarios for Streaming
7. ● To understand what stream
processing can do!
● Easy to solve common problems
in stream processing
● Where to use what?
● Learn best practices
Why Patterns for Streaming?
Image Source : https://www.flickr.com/photos/laurawoodillustration/6986871419
9. 1. Data collection
2. Data cleansing
3. Data transformation
4. Data enrichment
5. Data summarization
6. Rule processing
7. Machine learning & artificial intelligence
8. Data pipelining
9. Data publishing
10. On-demand processing
11. Data presentation
Stream Processing Patterns
10. Data
enrichment
(DB, Service)
Streaming App Patterns
Stream Processing
Data Collection
Data
Summarization &
Rule Processing
Query API
Data
Enrichment
(DB, Service)
ML
Models
Data Cleansing
& Data
Transformation
Streaming Data
Integration
Streaming Data
Analytics
Data
Pipelining
On demand
processing
Machine
Learning &
Artificial
Intelligence
Data
Publishing
Data
Presentation
17. Data type of Stream Processor is Tuple
Array[] containing values of
string, int, float, long, double, bool, object
JSON, XML,
Text, Binary,
Key-value,
CSV, Avro,
WSO2Event
Tuple
JSON, XML,
Text, Binary,
Key-value,
CSV, Avro,
WSO2Event
3. Data transformation
18. Contract message from Tuple
● Output mapping
● JSON processing functions
● Map functions
● String concats
3. Data transformation
Extract data to Tuple
● Input mapping
● JSON processing functions
● Map functions
● String manipulation
define stream ProductionStream (json string);
from ProductionInputStream
select json:getString(json,"$.name") as name,
json:getDouble(json,"$.amount") as amount
insert into ProductionStream;
Data Extraction
19. Transform data by
● Inline operations
○ math & logical operations
● Inbuilt function calls
○ 60+ extensions
● Custom function calls
○ Java, JS, R
3. Data transformation
myFunction(item, price) as discount
define function myFunction[lang_name] return return_type {
function_body
};
str:upper(ItemID) as IteamCode,
amount * price as cost
21. Type of data enrichment
● Datastore integration
○ RDBMS (MySQL, MSSQL, Oracle, Progress)
○ NoSQL (MongoDB, HBase, Cassandra)
○ In-memory grid (Hazelcast, Redis)
○ Indexing systems (Solr, Elasticsearch)
○ In-memory (in-memory table, window)
● Service integration
○ HTTP services
4. Data enrichment
22. Enriching data from table (store)
4. Data enrichment
define stream ProductionStream(idNum int, amount double);
@store(type=‘rdbms’, … )
@primaryKey(‘id’)
@Index(name)
define table ProductionInfoTable(id int, name string);
from ProductionStream as s join ProductionInfoTable as t
on s.idNum == t.id
select t.name, s.amount
insert into ProductionInfoStream;
Table
Join
23. Enriching data from HTTP Service Call
● Non blocking service calls
● Handle error conditions
4. Data enrichment
2**
4**
HTTP-Request
HTTP-Response
25. Type of data summarization
● Time based
○ Sliding time window
○ Tumbling time window
○ Multiple time intervals (secs to years)
● Event count based
○ Sliding length window
○ Tumbling length window
● Session based
● Frequency based
5. Data summarization
Type of aggregations
● Sum
● Count
● Min
● Max
● distinctCount
● stdDev
26. Multiple time intervals based summarization
● Aggregation on every second, minute, hour, … , year
● Built using 𝝀 architecture
● Real-time data in-memory
● Historic data from disk
● Works with RDBMs data stores
5. Data summarization
from ProductionAggregation
within "2018-12-10", "2018-12-13”
per "days"
select sales;
28. Type of predefined rules
● Rules on single event
○ If-then-else, match, etc.
● Rules on collection of events
○ Summarization
○ Join with window or table
● Rules based on event occurrence order
○ Pattern detection
○ Trend (sequence) detection
○ Non-occurrence of event
6. Rule processing
29. No occurrence of event pattern detection
6. Rule processing
define stream DeliveryStream (orderId string, amount double);
define stream PaymentStream (orderId string, amount double);
from every (e1 = DeliveryStream)
-> not PaymentStream [orderId == e1.orderId] for 15 min
select e1.orderId, e1.amount
insert into PaymentDelayedStream ;
33. 8. Data pipelining
Types of data pipelines
● Sequential data processing
○ Default behaviour
○ All queries are processed by the data retrieval thread
● Asynchronous data processing
○ Parallelly processed as event batches
○ @Async(buffer.size='256', workers='2', batch.size.max='5')
● Scatter and gather
○ json:tokenize() -> process->window.batch() -> json:setElement()
○ str:tokenize() ->process-> window.batch() -> str:groupConcat()
34. ● Sequential data processing
○ Default behavior
○ All queries are processed by the data retrieval thread
8. Data pipelining
1
2
35. ● Asynchronous data processing
○ Parallelly processed as event batches
8. Data pipelining
2
@Async(buffer.size='256', workers='2', batch.size.max='5')
define stream ProductionStream(name string, amount double);
2
11
36. ● Scheduled data processing
○ Periodically trigger an execution flow
○ Based on
■ Give time period
■ Cron expression
8. Data pipelining
define trigger FiveMinTriggerStream at every 5 min;
37. ● Scatter and gather
○ Divide into sub-elements, process each and combine the results
○ E.g.
○ json:tokenize() -> process -> window.batch() -> json:setElement()
○ str:tokenize() -> process -> window.batch() -> str:groupConcat()
8. Data pipelining
43. 10. On-demand processing
● Processing stored data using REST APIs
○ Data stores (RDBMS, NoSQL, etc)
○ Multiple time inviavel aggregation
○ In-memory windows, tables
44. 10. On-demand processing
● Running streaming queries via REST APIs
○ Synchronous Request-Response loopback
○ Understand current state of
the environment
46. 11. Data presentation
Data loaded to Data Stores
● RDBMS, NoSQL & In-Memory stores
Exposed via REST APIs
● On-demand data query APIs
● Running streaming queries or query data stores
curl -X POST https://localhost:7443/stores/query
-H "content-type: application/json"
-u "admin:admin"
-d '{"appName" : "RoomService",
"query" : "from RoomTypeTable select *" }'
-k
47. 11. Data presentation
Presented as Reports
● PDF, CSV
● Report generation
○ On demand & periodic reports
using Jasper reports
○ Exported from dashboard
48. 11. Data presentation
Visualized using dashboard
● Widget generation
● Fine-grained permissions
○ Dashboard level
○ Widget level
○ Data level
● Localization
● Inter widget
communication
● Shareable dashboards
49. 1. Data collection
2. Data cleansing
3. Data transformation
4. Data enrichment
5. Data summarization
6. Rule processing
7. Machine learning & artificial intelligence
8. Data pipelining
9. Data publishing
10. On-demand processing
11. Data presentation
Stream Processing Patterns
51. Developer Studio
for Streaming Apps
Drag n drop
query builder &
source editor
Edit, Debug, Simulate, & Test
All in one place!
52. Citizen Integration
for Streaming Data
Build rule templates
using editor
Configure rules via
form based UI
for non technical users
Rule Building
Rule Configuration
54. Stream Processing in the Edge or Emadded
• Streaming processing at the
sources
– Being embedded in Java or
Python applications
– Being at the edge as a
sidecar
– Micro Stream Processor
• Local decision making to build
intelligent systems
• ETL at the source
• Event routing
• Edge analytics
Dashboard
Notification
Invocation
Data Store
Event
Store
Event Source
Stream Processor
Siddhi
App
Stream Processor
Siddhi App
Siddhi App
Siddhi App
Feedback
55. High Availability with 2 Nodes
• 2 node minimum HA
– Process upto 100k
events/sec
– While most other stream
processing systems need
around 5+ nodes
• Zero event loss
• Incremental state persistence
and recovery
• Multi data center support
Stream Processor
Stream Processor
Event Sources
Dashboard
Notification
Invocation
Data Source
Siddhi App
Siddhi App
Siddhi App
Siddhi App
Siddhi App
Siddhi App
Event
Store
56. • Exactly-once
processing
• Fault tolerance
• Highly scalable
• No back pressure
• Distributed via
annotations
• Native support for
Kubernetes
Distributed Deployment
60. ● Lightweight, lean, and high performance
● Best suited for
○ Streaming Data Integration
○ Streaming Analytics
● Streaming SQL & graphical drag-and-drop editor
● Multiple deployment options
○ Process data at the edge (java, python)
○ Micro Stream Processing
○ High availability with 2 nodes
○ Highly scalable distributed deployments
● Support for streaming ML & Long running aggregations
● Monitoring tools and citizen integration options
WSO2 Stream Processor
61. 1. Event-driven data integration
2. Real-time ETL
3. Generating event streams from passive data
4. Streaming data routing
5. Notification management
6. Real-time decision making
7. KPI monitoring
8. Citizens integration on streaming data
9. Dashboarding and reporting
Business Scenarios for Streaming
62. ● Business scenarios for building streaming apps
● Why streaming patterns
● 11 patterns of building streaming apps
● When to use streaming patterns
● How WSO2 Stream Processor can help you to
build streaming apps
● How to develop, deploy and monitor streaming apps
We covered