Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Director, WSO2
The Rise of Streaming SQL
Sriskandarajah Suhothayan
Why Streaming?
Real-time
Near
Real-time
Offline
Constant low
milliseconds &
under
Low milliseconds
to
seconds
10s seconds
...
Why Streaming
Apps?
● Identify perishable insights
● Continuous integration
● Orchestration of business
processes
● Embedd...
What is Streaming Data?
A series of events/data having the same schema/format
appearing continuously
Coke 24 Fanta 14 Spri...
Almost All Data is Streaming!
All data is generated one by one,
hence batch data is at one point streaming
● Logs
● Transa...
● Process data at the source or process before we store
● Identify insights in real-time and act immediately
● Reduce unne...
Streaming Data
Processing
Operations
● Event-driven architecture
● Streaming data integration
● Streaming data preprocessi...
Positives
● Analytics and machine learning
use cases shifting to stream
processing
● Positive trends
○ Microservices and o...
1. Code it yourself
+ Customized for your
requirement
− A lot of glue code needs to
be written
2. Stream Processors
+ Code...
History of Stream Processing
Databases: Users query when they need data
History of Stream Processing
Databases: Users query when they need data
Active Databases: Users want to act when data meet...
History of Stream Processing
Databases: Users query when they need data
Active Databases: Users want to act when data meet...
History of Stream Processing
TelegraphCQ (based PostgreSQL):
Long-running continuous queries over data streams
Complex Eve...
History of Stream Processing
Complex Event Processing:
Detect complex event patterns
and correlations,
1 or 2 nodes & not ...
History of Stream Processing
Niche Applications:
Stock markets, monitoring and alerts, & surveillance
Stream Processing en...
History of Stream Processing
Niche Applications:
Stock markets, monitoring and alerts, & surveillance
Stream Processing en...
History of Stream Processing
Big Data Switched to SQL:
From coding based MapReduce
History of Stream Processing
Big Data Switched to SQL:
From coding based MapReduce
Stream Processing + CEP Merge:
Support ...
History of Stream Processing
Big Data Switched to SQL:
From coding based MapReduce
Stream Processing + CEP Merge:
Support ...
Streaming SQL
Source :https://tdwi.org/articles/2017/08/07/data-all-enabling-real-time-enterprise-with-data-streaming.aspx
SQL vs Streaming SQL
SQL
● Work on a finite data table
● Queries run over static data
● Synchronous response
Streaming SQL...
Siddhi Streaming SQL Overview
@app:name(‘Factory-Analytics’)
@source(type = mqtt, …, @map(type = json, …))
define stream P...
ChallengesChallenges
Source : https://www.pardot.com/blog/3-pressing-b2b-marketing-challenges-solved-with-marketing-automa...
Challenges
In streaming SQL
● Not easy to visualize the topology
● Not easy for a business user to construct rules
In stre...
WSO2 Stream Processor
WOS2 Stream Processor
How Does
WSO2 Stream Processor
Solve Them?
Developer Studio
for Streaming Apps
Drag n drop
query builder &
source editor
Edit, Debug, Simulate, & Test
All in one pla...
● Graphical stream
SQL query editor
● Drag-&-drop
support
● Switch to source &
design
Way to Visualize Topology
Citizen Integration
for Streaming Data
Build rule templates
using editor
Configure rules via
form based UI
for non technic...
Stream Processing in the Edge or Emadded
• Streaming processing at the
sources
– Being embedded in Java or
Python applicat...
High Availability with 2 Nodes
• 2 node minimum HA
– Process upto 100k
events/sec
– While most other stream
processing sys...
Distributed Deployment with Kafka
Data
Base
Event
Source
Event
Sink
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App...
• Exactly-once
processing
• Fault tolerance
• Highly scalable
• No back pressure
• Distributed via
annotations
• Native su...
Sample Distributed Siddhi App
@source(type = ‘kafka’, …, @map(type = ‘json’))
define stream ProductionStream (name string,...
Monitor Deployments
Monitor Resource Nodes and Siddhi Apps
with Status Dashboard
• Understand performance via
– Throughput...
● Serving pre-created ML models
○ PMML (build from Python, R, Spark, H2O.ai, etc)
○ TensorFlow
● Online machine learning
○...
● Incremental aggregation
○ Aggregation for every second, minute, hour, … , year
● Built using 𝝀 architecture
● Real-time ...
Visualization Support via Dashboards
● Widget generation
● Fine grained permissions
○ Dashboard level
○ Widget level
○ Dat...
1. Event-driven data integration
2. Real-time ETL
3. Generating event streams from passive data
4. Streaming data routing
...
● Lightweight, lean, and high performance
● Best suited for
○ Streaming data integration
○ Streaming analytics
● Streaming...
THANK YOU
wso2.com
Upcoming SlideShare
Loading in …5
×

[WSO2Con EU 2018] The Rise of Streaming SQL

158 views

Published on

This session takes an in-depth look at:
- Trends in stream processing
- How streaming SQL has become a standard
- The advantages of Streaming SQL
- Ease of development with streaming SQL: Graphical and Streaming SQL query editors
- Business value of streaming SQL and its related tools: Domain-specific UIs
- Scalable deployment of streaming SQL: Distributed processing

Published in: Technology
  • Be the first to comment

  • Be the first to like this

[WSO2Con EU 2018] The Rise of Streaming SQL

  1. 1. Director, WSO2 The Rise of Streaming SQL Sriskandarajah Suhothayan
  2. 2. Why Streaming? Real-time Near Real-time Offline Constant low milliseconds & under Low milliseconds to seconds 10s seconds to minutes ● A stream is series of events ● Almost all new data is streaming ● Detects conditions quickly Image Source : https://www.flickr.com/photos/plusbeautumeurs/33307049175
  3. 3. Why Streaming Apps? ● Identify perishable insights ● Continuous integration ● Orchestration of business processes ● Embedded execution of code ● Sense, think, and act in real time - Forrester
  4. 4. What is Streaming Data? A series of events/data having the same schema/format appearing continuously Coke 24 Fanta 14 Sprite 20 Coke 4 <coke>24</coke> <fanta>14</fanta> <sprite>20</sprite> <coke>4</coke>
  5. 5. Almost All Data is Streaming! All data is generated one by one, hence batch data is at one point streaming ● Logs ● Transaction data ● Sensor data ● Traffic data Data is streaming at the source!
  6. 6. ● Process data at the source or process before we store ● Identify insights in real-time and act immediately ● Reduce unnecessary data storage and batch processing Streaming Data Processing Stream Processing Logs Senors Devices Apps Services Alerts Dashboards Services Databases
  7. 7. Streaming Data Processing Operations ● Event-driven architecture ● Streaming data integration ● Streaming data preprocessing ● Data store & service integration ● Streaming data summarization ● KPI analysis and alerts ● Event correlation ● Pattern matching & trend analysis ● Real-time prediction ● Streaming machine learning ● … more
  8. 8. Positives ● Analytics and machine learning use cases shifting to stream processing ● Positive trends ○ Microservices and observability ○ Rise of IoT ○ Security analytics ○ ETL and messaging Stream Processing Market Negatives ● Lack of proficient developers are slowing it down ● Success depends on the success of the analytics and integration market ● Market size ○ 300 ~ 500 million having 30%
  9. 9. 1. Code it yourself + Customized for your requirement − A lot of glue code needs to be written 2. Stream Processors + Code only actors and data handlers + Can scale and handle failure − Hard to maintain and change Building Streaming Apps 3. Graphical Tools + Good for primitive users & can visualize the topology − Inefficient for advanced users 4. Streaming SQL + Good for advanced users + Easier to understand and faster implementation − Not easy to visualize the topology
  10. 10. History of Stream Processing Databases: Users query when they need data
  11. 11. History of Stream Processing Databases: Users query when they need data Active Databases: Users want to act when data meets a condition
  12. 12. History of Stream Processing Databases: Users query when they need data Active Databases: Users want to act when data meets a condition TelegraphCQ (based PostgreSQL): Long-running continuous queries over data streams
  13. 13. History of Stream Processing TelegraphCQ (based PostgreSQL): Long-running continuous queries over data streams Complex Event Processing: Detect complex event patterns and correlations, 1 or 2 nodes & not scalable E.g. SASE, Esper, Cayuga, and Siddhi (powers WSO2 SP), Apama, IBM Infosphere Stream Processing: Scalable processing of data using a graph of actors run on many nodes & scales E.g. Aurora, PIPES, STREAM, Borealis (academic)
  14. 14. History of Stream Processing Complex Event Processing: Detect complex event patterns and correlations, 1 or 2 nodes & not scalable E.g. SASE, Esper, Cayuga, and Siddhi (powers WSO2 SP), Apama, IBM Infosphere Stream Processing: Scalable processing of data using a graph of actors run on many nodes & scales E.g. Aurora, PIPES, STREAM, Borealis (academic) Niche Applications: Stock markets, monitoring and alerts, & surveillance
  15. 15. History of Stream Processing Niche Applications: Stock markets, monitoring and alerts, & surveillance Stream Processing enters Big Data: Yahoo S4 (2010) , Twitter Storm (2011) was donated to Apache
  16. 16. History of Stream Processing Niche Applications: Stock markets, monitoring and alerts, & surveillance Stream Processing enters Big Data: Yahoo S4 (2010) , Twitter Storm (2011) was donated to Apache Described as “like Hadoop, but in real-time” Wide adoption and visibility: Spark Streaming, Samza, Flink
  17. 17. History of Stream Processing Big Data Switched to SQL: From coding based MapReduce
  18. 18. History of Stream Processing Big Data Switched to SQL: From coding based MapReduce Stream Processing + CEP Merge: Support SQL over many nodes in real-time
  19. 19. History of Stream Processing Big Data Switched to SQL: From coding based MapReduce Stream Processing + CEP Merge: Support SQL over many nodes in real-time Streaming SQL: Apache Storm, Apache Flink, WSO2 SP, Apache Kafka (KSQL), Apache Samza and Calcite
  20. 20. Streaming SQL Source :https://tdwi.org/articles/2017/08/07/data-all-enabling-real-time-enterprise-with-data-streaming.aspx
  21. 21. SQL vs Streaming SQL SQL ● Work on a finite data table ● Queries run over static data ● Synchronous response Streaming SQL ● Works on infinite data table == data stream ● Data runs over static queries ● Asynchronous response data data data data Query data data Query data data
  22. 22. Siddhi Streaming SQL Overview @app:name(‘Factory-Analytics’) @source(type = mqtt, …, @map(type = json, …)) define stream ProductionStream (name string, amount double); @store(type=‘rdbms’, … ) @primaryKey(name) define table LastHourProductionTable(name string, cost double); from ProductionStream[amount > 0]#window.timeBatch(‘1 hour’) select name, sum(amount) as cost group by name insert into LastHourProductionTable ; Source/Sink & Streams Queries Tables
  23. 23. ChallengesChallenges Source : https://www.pardot.com/blog/3-pressing-b2b-marketing-challenges-solved-with-marketing-automation/
  24. 24. Challenges In streaming SQL ● Not easy to visualize the topology ● Not easy for a business user to construct rules In stream processing ● Inability to handle state ● Need multiple nodes ● Does not support online machine learning ● Does not support long-running aggregates in real time ● No visualization tools
  25. 25. WSO2 Stream Processor
  26. 26. WOS2 Stream Processor
  27. 27. How Does WSO2 Stream Processor Solve Them?
  28. 28. Developer Studio for Streaming Apps Drag n drop query builder & source editor Edit, Debug, Simulate, & Test All in one place!
  29. 29. ● Graphical stream SQL query editor ● Drag-&-drop support ● Switch to source & design Way to Visualize Topology
  30. 30. Citizen Integration for Streaming Data Build rule templates using editor Configure rules via form based UI for non technical users Rule Building Rule Config
  31. 31. Stream Processing in the Edge or Emadded • Streaming processing at the sources – Being embedded in Java or Python applications – Being on the edge as a sidecar • Local decision making to build intelligent systems • ETL at the source • Event routing • Edge analytics Dashboard Notification Invocation Data Store Event Store Event Source Stream Processor Siddhi App Stream Processor Siddhi App Siddhi App Siddhi App Feedback
  32. 32. High Availability with 2 Nodes • 2 node minimum HA – Process upto 100k events/sec – While most other stream processing systems need around 5+ nodes • Zero event loss • Incremental state persistence and recovery • Multi data center support Stream Processor Stream Processor Event Sources Dashboard Notification Invocation Data Source Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Event Store
  33. 33. Distributed Deployment with Kafka Data Base Event Source Event Sink Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Kafka Topic Kafka Topic Kafka Topic Kafka Topic Kafka Topic
  34. 34. • Exactly-once processing • Fault tolerance • Highly scalable • No back pressure • Distributed via annotations • Native support for Kubernetes Scaling with Distributed Deployment
  35. 35. Sample Distributed Siddhi App @source(type = ‘kafka’, …, @map(type = ‘json’)) define stream ProductionStream (name string, amount double, factoryId int); @dist(parallel = ‘4’, execGroup = ‘gp1’) from ProductionStream[amount > 100] select * insert into HighProductionStream ; @dist(parallel = ‘2’, execGroup = ‘gp2’) partition with (factoryId of HighProductionStream) begin from HighProductionStream#window.timeBatch(1 min) select factoryId, sum(amount) as amount group by factoryId insert into ProdRateStream ; end; Filter Source FilterFilterFilter PartitionPartition
  36. 36. Monitor Deployments Monitor Resource Nodes and Siddhi Apps with Status Dashboard • Understand performance via – Throughput – Latency – CPU, Memory utilizations • Monitor various scales – Node level – Siddhi app level – Siddhi query level
  37. 37. ● Serving pre-created ML models ○ PMML (build from Python, R, Spark, H2O.ai, etc) ○ TensorFlow ● Online machine learning ○ Clustering ○ Classification ○ Regression ● Anomaly detection ○ Markov model ○ …more Challenge: Lack of Knowledge About Future
  38. 38. ● Incremental aggregation ○ Aggregation for every second, minute, hour, … , year ● Built using 𝝀 architecture ● Real-time data in-memory ● Historic data from disk ● Works with RDBMs data stores ● No big data storage is necessary Challenge: Cannot Run Long Running Aggregates
  39. 39. Visualization Support via Dashboards ● Widget generation ● Fine grained permissions ○ Dashboard level ○ Widget level ○ Data level ● Localization ● Inter widget communication ● Shareable dashboards
  40. 40. 1. Event-driven data integration 2. Real-time ETL 3. Generating event streams from passive data 4. Streaming data routing 5. Notification management 6. Real-time decision making 7. KPI monitoring 8. Citizens integration on streaming data 9. Dashboarding and reporting Business Scenarios for Streaming
  41. 41. ● Lightweight, lean, and high performance ● Best suited for ○ Streaming data integration ○ Streaming analytics ● Streaming SQL & graphical drag-and-drop editor ● Multiple deployment options ○ Process data at the edge (Java, Python) ○ Micro stream processing ○ High availability with 2 nodes ○ Highly scalable distributed deployments ● Support for streaming ML and long running aggregations ● Monitoring tools and citizen integration options WSO2 Stream Processor
  42. 42. THANK YOU wso2.com

×