Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[WSO2Con USA 2018] The Rise of Streaming SQL

116 views

Published on

This slide deck explores trends in stream processing, how streaming SQL has become a standard, the advantages of streaming SQL and more.

View video: https://wso2.com/library/conference/2018/07/wso2con-usa-2018-the-rise-of-streaming-sql/

Published in: Technology
  • Be the first to comment

[WSO2Con USA 2018] The Rise of Streaming SQL

  1. 1. Director, WSO2 The Rise of Streaming SQL Sriskandarajah Suhothayan
  2. 2. What is Streaming Data? A series of events/data having the same schema/format appearing continuously Coke 24 Fanta 14 Sprite 20 Coke 4 <coke>24</coke> <fanta>14</fanta> <sprite>20</sprite> <coke>4</coke>
  3. 3. Almost All Data is Streaming! All data is generated one by one, hence batch data is at one point streaming ● Logs ● Transaction data ● Sensor data ● Traffic data Data is streaming at the source!
  4. 4. ● Process data at the source or process before we store ● Identify insights in real-time and act immediately ● Reduce unnecessary data storage and batch processing Streaming Data Processing Stream Processing Logs Senors Devices Apps Services Alerts Dashboards Services Databases
  5. 5. Streaming Data Processing Operations ● Event driven architecture ● Steaming data integration ● Streaming data preprocessing ● Data store integration ● Service integration ● Streaming data summarization ● KPI analysis and alerts ● Event correlation ● Pattern matching ● Trend analysis ● Real-time prediction ● Streaming machine learning ● … more
  6. 6. Positives ● Analytics and machine learning use cases shifting to stream processing ● Positive trends ○ Microservices and observability ○ Rise of IoT ○ Security analytics ○ ETL and messaging Stream Processing Market Negatives ● Lack of proficient developers are slowing it down ● Success depends on the success of the analytics and integration market ● Market size ○ 300 ~ 500 million having 30%
  7. 7. 1. Code it yourself + Customized for your requirement − A lot of glue code needs to be written 2. Stream Processors + Code only actors and data handlers + Can scale and handle failure − Hard to maintain and change Building Streaming Apps 3. Graphical Tools + Good for primitive users & can visualize the topology − Inefficient for advanced users 4. Streaming SQL + Good for advanced users + Easier to understand and faster implementation − Not easy to visualize the topology
  8. 8. History of Stream Processing Databases: Users query when they need data
  9. 9. History of Stream Processing Databases: Users query when they need data Active Databases: Users want to act when data meets a condition
  10. 10. History of Stream Processing Databases: Users query when they need data Active Databases: Users want to act when data meets a condition TelegraphCQ (based PostgreSQL): Long-running continuous queries over data streams
  11. 11. History of Stream Processing TelegraphCQ (based PostgreSQL): Long-running continuous queries over data streams Complex Event Processing: Detect complex event patterns and correlations, 1 or 2 nodes & not scalable E.g. SASE, Esper, Cayuga, and Siddhi (powers WSO2 SP), Apama, IBM Infosphere Stream Processing: Scalable processing of data using a graph of actors run on many nodes & scales E.g. Aurora, PIPES, STREAM, Borealis (academic)
  12. 12. History of Stream Processing Complex Event Processing: Detect complex event patterns and correlations, 1 or 2 nodes & not scalable E.g. SASE, Esper, Cayuga, and Siddhi (powers WSO2 SP), Apama, IBM Infosphere Stream Processing: Scalable processing of data using a graph of actors run on many nodes & scales E.g. Aurora, PIPES, STREAM, Borealis (academic) Niche Applications: Stock markets, monitoring and alerts, & surveillance
  13. 13. History of Stream Processing Niche Applications: Stock markets, monitoring and alerts, & surveillance Stream Processing Enters Big Data: Yahoo S4 (2010) , Twitter Storm (2011) was donated to Apache
  14. 14. History of Stream Processing Niche Applications: Stock markets, monitoring and alerts, & surveillance Stream Processing enter Big Data: Yahoo S4 (2010) , Twitter Storm (2011) was donated to Apache Described as “like Hadoop, but in real-time” Wide adoption and visibility: Spark Streaming, Samza, Flink
  15. 15. History of Stream Processing Big Data Switched to SQL: From coding based MapReduce
  16. 16. History of Stream Processing Big Data Switched to SQL: From coding based MapReduce Stream Processing + CEP Merge: Support SQL over many nodes in real-time
  17. 17. History of Stream Processing Big Data Switched to SQL: From coding based MapReduce Stream Processing + CEP Merge: Support SQL over many nodes in real-time Streaming SQL : Apache Storm, Apache Flink, WSO2 SP, Apache Kafka (KSQL), Apache Samza and Calcite
  18. 18. Streaming SQL Source :https://tdwi.org/articles/2017/08/07/data-all-enabling-real-time-enterprise-with-data-streaming.aspx
  19. 19. SQL vs Streaming SQL SQL ● Work on a finite data table ● Queries run over static data ● Synchronous response Streaming SQL ● Works on infinite data table == data stream ● Data runs over static queries ● Asynchronous response data data data data Query data data Query data data
  20. 20. Siddhi Streaming SQL Overview @app:name(‘Sweet-Factory-Analytics’) @source(type = mqtt, …, @map(type = json, …)) define stream SweetProductionStream(name string, amount double); from SweetProductionStream[amount < 100 and name == ‘candy’] select name, sum(amount) as cost group by name insert into LawCostCandyProdcutionStream ; @store(type=‘rdbms’, … ) @primaryKey(‘id’) @Index(amount) define table ProductionTable(name string, cost double); Source/Sink & Streams Queries Tables
  21. 21. ChallengesChallenges Source : https://www.pardot.com/blog/3-pressing-b2b-marketing-challenges-solved-with-marketing-automation/
  22. 22. Challenges In streaming SQL ● Not easy to visualize the topology In stream processing ● Inability to handle state ● Needs multiple nodes ● Does not support online machine learning ● Does not support long running aggregates in real-time
  23. 23. WSO2 Stream Processor
  24. 24. WOS2 Stream Processor
  25. 25. How Does WSO2 Stream Processor Solve Them?
  26. 26. ● Graphical stream SQL query editor ● Drag & drop support ● Switch to source & design Challenge: Not Easy to Visualize Topology
  27. 27. Challenge: Handle State & Need for Multi Nodes • 2 node minimum HA – Process upto 100k events/sec – While most other stream processing systems need around 5+ nodes • Scale more with Kafka • Incremental state persistence and recovery Stream Processor Stream Processor Event Sources Dashboard Notification Invocation Data Source Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Event Store
  28. 28. Running PMML Models for predictions ● Build PMML models via Apache Spark MLlib, H2O.ai, R or Python ● Load built PMML Model into Siddhi and predict in real-time Supporting native prediction models: ● Spark MLlib Models, and Java based Tensorflow Models Online Learning and predictions ● Regression analytics ● Markov models ● Anomaly detections ● K-Means clustering ● …more Challenge: Lack of Knowledge About Future
  29. 29. ● Incremental aggregation ○ Aggregation for every second, minute, hour, … , year ● Built on top of architecture ● No big data storage is necessary ● Current values in memory and others in disk ● Executed in a single query Challenge: Cannot Run Long Running Aggregates Current Min Current Hour Sec Min Hour 0 - 1 - 5 ... - 1 - 2 - 3 - 4 - 64 - 65 ... - 2 - 124
  30. 30. 1. Start with 2 nodes and scale without changing queries 2. Detect complex event patterns over time 3. Run machine learning models to perform online learning 4. Fuse data in motion and data at rest 5. Perform aggregations from seconds to years 6. Let end users tweak queries 7. Achieve real-time ETL 8. Run rule-based decision making 9. ....more When to Use WSO2 Stream Processor
  31. 31. THANK YOU wso2.com

×