Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Rise of Streaming SQL and Evolution of Streaming Applications

240 views

Published on

First-generation stream processors, such as Apache Storm, wanted us to write code. It was a great start. However, when building real-world apps, which are used for a long time and evolve, writing code gets us into trouble.

If we want to query a database or query data stored in storage with Hadoop, we use SQL. Why can't we query data streaming using SQL? We can. Almost all open source stream processors, including Storm, Flink, and Kafka, have switched to SQL.

In this webinar, Srinath will talk about the evolution of stream processing, streaming SQL, the status quo, and what this means to stream applications. He will also dissect the experience of building streaming applications by exploring common patterns and pitfalls.

Published in: Data & Analytics
  • Be the first to comment

The Rise of Streaming SQL and Evolution of Streaming Applications

  1. 1. Srinath Perera VP Research, WSO2 srinath@wso2.com The Rise of Streaming SQL and Evolution of Streaming Applications
  2. 2. What is Streaming? • A Stream is series of Events • Query Data Streams • Detect conditions fast (within the time of receiving the data, - 10ms-1m). e.g. receive an alert by querying a data streams coming from a temperature sensor and detecting when the temperature has reached the freezing point.
  3. 3. Almost all new data is Streaming Almost all new data is streams, even batch data are at one point potential streaming data One can choose to consume them as streaming data or batch data based on value of responding to them fast • Transaction data • Log data • Sensor data • Health data • Traffic Data
  4. 4. Stream Processing Market Lack of proficient developers are slowing it Success depends on Analytics Positive trends • Microservices and Observability • Security analytics • EDA and Messaging Lot of analytics and machine learning use cases will eventually shift to stream processing Stream Processing and IoT depends on each Other Market 200-500m 30% growth
  5. 5. Building a Streaming App Code it Yourself • Code it yourself • Publish data to a message topic • Write a actor: Subscribe, process, and put back to a topic Use a Streaming SQL based Stream Processor • Just write Streaming SQL ( will discuss later) Use a Stream Processors • You just write actor and stream processor handles data flow, scale, failures
  6. 6. History of Stream Processing Started with active databases, users want to act when data met a condition TelegraphCQ (based PostgreSQL) People thought about this outside of databases as well Stream Processing Complex Event Processing
  7. 7. History of Stream Processing Stream Processing • Create a graph of actors and run them using many machines • e.g. Aurora, PIPES, STREAM, Borealis ( academic) Complex Event Processing Processing • Provide a query language and focused on effect matching on 1-2 nodes. • SASE, Esper, Cayuga, and Siddhi (powers WSO2 SP), Apama, IBM Infosphere Niche Applications: Stock Markets, Monitoring and Alerts, Surveillance
  8. 8. Stream Processing enters Big Data Yahoo S4 (2010) Twitter Storm (2011) Both were donated to Apache Described as “like Hadoop, but realtime” Wide adoption and visibility Spark Streaming, Samza, Flink
  9. 9. Rise of Streaming SQL Apache based SP engines used Code as API Big Data Switched to SQL from MapReduce Merged to support SQL over many nodes Streaming SQL Apache Storm Apache Flink WSO2 CEP->WSO2 SP Apache Kafka (KSQL) Apache Samza and Calcite CEPStream Processing
  10. 10. What is Streaming SQL? Time bID T 07:23:30 B1 210 07:23:37 B1 234 … … A Stream is a table never ending table, think of table where new data (events) kept adding Select bid, t*7/5 + 32 as tF from BoilerStream Where t > 350 Streaming SQL is SQL written in such a never ending table Unlike SQL that returns data when query us done, Streaming SQL outputs data as new events are added You get a trigger whenever data matches
  11. 11. Why Streaming SQL? core operations covers 90% of use cases without code, rest handled via extensions Easy to learn for the many people who know SQL. It's expressive, short, sweet and fast!! Manipulate streaming data declaratively without having to write code. A query engine can better optimize the executions with a streaming SQL model.
  12. 12. Common Solutions with SP Detect a condition and trigger an alert that bring user back to dashboard Condition can be • a simple limit • A complex trend over time • correlations across streams • a machine learning model Detect a condition and update a dashboard Train a ML model apply over steaming data, and switch models as they drift Detect a condition and trigger an action Calculate short term values, store them long term in a database, and show single view }
  13. 13. Stream Processor are Stateful Stream Processors works off memory, that is the secret of their performance in 50K plus throughput To avoid this, Stream processors must have HA Stream Queries never ends When a stream processor failed, which it eventually must, the streaming App will loose state Most stream queries are stateful (e.g. patterns, windows, joins) }
  14. 14. Most Stream Processors are Obese Most Stream Processors need 5+ nodes to setup a HA environment Then minimal HA size matters. Their use cases are large, so are there deployments. 5 plus nodes are not a problem for large use deployments However, given a Stream Processors can do 50,000 events per second, most use cases need a one node. Most famous Stream Processors come from large internet companies }
  15. 15. Stream Processing need ML Use Streaming machine learning that learns on the fly Train the models offline and apply online. When model drift from data, retrain and swap the model. As stream processing is the real time extension of batch processing. Most batch ML use cases will apply in realtime as well. }
  16. 16. SP need Advanced Query Authoring Environments We need integrated development environments that let developers write, simulate, debug, trace, and verify and do it Lack of programmers who are comfortable with stream processing is holding it back Stream processing queries are like regular expressions, which are • Based on simple rules • very powerful • tough on new programmers } }
  17. 17. Stream Processors are So Far Two branches: Stream Processing and CEP Obese Rise of Streaming SQL Introduction to Stream Processing Apache Storm and inclusion to Big Data Stateful and Need HA Need ML Need Authoring Tools
  18. 18. WSO2 SP
  19. 19. When to use WSO2 SP? When you want to detect complex patterns over time When you want to fuse data in motion and data at rest in same application When you are not sure about the final load ( scale with Kafa with same queries) When you want to do ML within your queries When your load is less than 100,000 events/sec ( WSO2 SP support with just two nodes) When you want your end users to tweak your queries
  20. 20. Next Steps Checkout WSO2 Stream Processor Learn about Streaming Applications with 13 Stream Processing Patterns for building Streaming Applications Webinar: Distributed Stream Processing with WSO2 SP Learn about Streaming SQL with Streaming SQL 101 Webinar: WSO2 Stream Processor
  21. 21. Questions? I write at https://medium.com/@srinathperera

×