State Management in
Structured Streaming
Chandan Prakash
00Copyright 2018 © Qubole
Agenda
● Structured Streaming : Brief Intro
● Types of Stream Processing : Stateless vs Stateful
● State in Stream Processing
● State Store in Stream Processing
● State Management in Old Spark Streaming
● State Management in Structured Streaming
● Demo with Code Example
● Quiz , Food For Thought
00Copyright 2018 © Qubole
What does this picture represent ?
Image Source: google
00Copyright 2018 © Qubole
Batch Processing Stream Processing
Image Source: google
00Copyright 2018 © Qubole
Structured Streaming : Brief Intro
Image Source: google
● Built on Spark SQL engine
● Illusion : Stream of incoming data as unbounded Input Table, Processing
logic as Sql Query, output of processing as Results Table
● Internally query gets converted into incremental Micro-batch processing
00Copyright 2018 © Qubole
Structured Streaming Query Example
00Copyright 2018 © Qubole
Types of Stream Processing
● Stateless Streaming
○ Processing of every record is independent
○ Operations like map, filter
● Stateful Streaming
○ Processing of record is dependent on
previous records
○ Operations like aggregating count of records
per distinct key, deduplicating records
00Copyright 2018 © Qubole
State in Stream Processing
● State of Streaming Progress
○ Metadata of stream processing : offsets
○ Keeping track how much data processed so far
○ Needed for fault tolerance
○ Present in both stateless and stateful processing
● State of Data
○ Intermediate data information between records
○ Operations like aggregation, deduplication
○ Present in Stateful Processing
Note: When we say “State”, in general it means the State of data for processing. The
other one is called metadata/offsets
00Copyright 2018 © Qubole
State Store in Streaming
● Reliable place providing read and write of
intermediate data (state)
● Can sustain streaming failures and restore
processing from the same point
● Options :
In-memory, File Systems, Storage Systems
In-Memory HashMap
00Copyright 2018 © Qubole
State Management in old/Dstream Spark Streaming
● RDD based Streaming
● Inefficient Flawed design
○ State persisted with offset metadata
○ Complete snapshot persistence every microbatch
○ Tightly coupled, synchronous with Spark RDD tasks
○ No provision for incremental state persistence
○ Processing overhead, bottleneck as state grows
00Copyright 2018 © Qubole
State Management in Structured Streaming
Fundamental shift from Old Spark Streaming
● Decoupled from offsets/metadata checkpointing
● Asynchronous to Spark Tasks/Jobs
● Incremental State persistence
00Copyright 2018 © Qubole
HDFS backed State Management
1. In-Memory Hashmap + HDFS
2. Versioned key-value store per
partition
3. Versioned Delta file per partition
4. Partition Task scheduled on same
executor where previous state is
5. Synchronous write to HashMap and
Delta file outputstream
6. Asynchronous daemon thread per
executor for snapshotting, file
purging/deletion in HDFS
7. Only one thread in Executor can write
to a delta file. But threads from
multiple executors can try to write to
same delta file.
00Copyright 2018 © Qubole
Code Entities in HDFS backed State Management
● StatefulOperators
○ defines computation logic to be executed against the state store with set of rows in a partition
● StateStoreOps
○ prepares a StateStoreRDD for doing computations against state store with the computation logic
passed by the stateful operator.
● storeUpdateFunction
○ contains the computation logic defining what to do against the state store with data generated in a
partition task.
● HDFSBackedStateStore
○ concrete implementation of State Store using concurrent hashmap, backed by HDFS file system
for persistence.
● HDFSBackedStateStoreProvider
○ contains methods to get given store and execute maintenance task (snapshotting , purging,
deleting files, cleaning old states).
● StateStoreCoordinator
○ ensures task for a partition gets scheduled on an executor where its last versioned state is
maintained in hashmap.
00Copyright 2018 © Qubole
Code Flow of Stateful Structured Streaming
00Copyright 2018 © Qubole
Quiz Time
Possible Issues with
the HDFS backed
implementation in
production ?
00Copyright 2018 © Qubole
Quiz Time
Possible Issues with
the HDFS backed
implementation in
production ?
● State is constrained by executor
memory
● Same executor memory to be shared
with RDD computation
● Single Daemon thread responsible
snapshotting entire state hashmaps,
file cleanings, etc
00Copyright 2018 © Qubole
In-Memory HashMap
Possible Solution ?
Food for Thought
00Copyright 2018 © Qubole
Embedded/Local Store :
● Key-Value embedded data store
● Improvised LevelDB open sourced by
Facebook
● Bring Database close to Processing
● Pros :
○ No Memory Issues (HashMap)
○ No Network Latency (Cassandra)
○ Fast writes : Buffer + Sequential Transaction Log
○ Isolation
● Cons
○ Not Distributed
○ Not Replicated
○ Overhead of maintenance, non-JVM memory
● Architecture
○ Memtable : in-memory buffer
○ Change Log
○ SST Table on disk
Image Source: google
00Copyright 2018 © Qubole
in Streaming Systems
● Apache Flink
https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html
● Apache Samza
https://samza.apache.org/learn/documentation/0.7.0/container/state-management.html
● Kafka Streams
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Internal+Data+Mana
gement
00Copyright 2018 © Qubole
Summary
● What is Stateful Processing and State in Streaming
● Architecture of State Management in Stateful processing of Structured
Streaming
● Code Example
● Why Embedded Store like RocksDB is so important in Stream Processing
Thank You. Questions?
Qubole Blog : https://www.qubole.com/blog/

State management in Structured Streaming

  • 1.
    State Management in StructuredStreaming Chandan Prakash
  • 2.
    00Copyright 2018 ©Qubole Agenda ● Structured Streaming : Brief Intro ● Types of Stream Processing : Stateless vs Stateful ● State in Stream Processing ● State Store in Stream Processing ● State Management in Old Spark Streaming ● State Management in Structured Streaming ● Demo with Code Example ● Quiz , Food For Thought
  • 3.
    00Copyright 2018 ©Qubole What does this picture represent ? Image Source: google
  • 4.
    00Copyright 2018 ©Qubole Batch Processing Stream Processing Image Source: google
  • 5.
    00Copyright 2018 ©Qubole Structured Streaming : Brief Intro Image Source: google ● Built on Spark SQL engine ● Illusion : Stream of incoming data as unbounded Input Table, Processing logic as Sql Query, output of processing as Results Table ● Internally query gets converted into incremental Micro-batch processing
  • 6.
    00Copyright 2018 ©Qubole Structured Streaming Query Example
  • 7.
    00Copyright 2018 ©Qubole Types of Stream Processing ● Stateless Streaming ○ Processing of every record is independent ○ Operations like map, filter ● Stateful Streaming ○ Processing of record is dependent on previous records ○ Operations like aggregating count of records per distinct key, deduplicating records
  • 8.
    00Copyright 2018 ©Qubole State in Stream Processing ● State of Streaming Progress ○ Metadata of stream processing : offsets ○ Keeping track how much data processed so far ○ Needed for fault tolerance ○ Present in both stateless and stateful processing ● State of Data ○ Intermediate data information between records ○ Operations like aggregation, deduplication ○ Present in Stateful Processing Note: When we say “State”, in general it means the State of data for processing. The other one is called metadata/offsets
  • 9.
    00Copyright 2018 ©Qubole State Store in Streaming ● Reliable place providing read and write of intermediate data (state) ● Can sustain streaming failures and restore processing from the same point ● Options : In-memory, File Systems, Storage Systems In-Memory HashMap
  • 10.
    00Copyright 2018 ©Qubole State Management in old/Dstream Spark Streaming ● RDD based Streaming ● Inefficient Flawed design ○ State persisted with offset metadata ○ Complete snapshot persistence every microbatch ○ Tightly coupled, synchronous with Spark RDD tasks ○ No provision for incremental state persistence ○ Processing overhead, bottleneck as state grows
  • 11.
    00Copyright 2018 ©Qubole State Management in Structured Streaming Fundamental shift from Old Spark Streaming ● Decoupled from offsets/metadata checkpointing ● Asynchronous to Spark Tasks/Jobs ● Incremental State persistence
  • 12.
    00Copyright 2018 ©Qubole HDFS backed State Management 1. In-Memory Hashmap + HDFS 2. Versioned key-value store per partition 3. Versioned Delta file per partition 4. Partition Task scheduled on same executor where previous state is 5. Synchronous write to HashMap and Delta file outputstream 6. Asynchronous daemon thread per executor for snapshotting, file purging/deletion in HDFS 7. Only one thread in Executor can write to a delta file. But threads from multiple executors can try to write to same delta file.
  • 13.
    00Copyright 2018 ©Qubole Code Entities in HDFS backed State Management ● StatefulOperators ○ defines computation logic to be executed against the state store with set of rows in a partition ● StateStoreOps ○ prepares a StateStoreRDD for doing computations against state store with the computation logic passed by the stateful operator. ● storeUpdateFunction ○ contains the computation logic defining what to do against the state store with data generated in a partition task. ● HDFSBackedStateStore ○ concrete implementation of State Store using concurrent hashmap, backed by HDFS file system for persistence. ● HDFSBackedStateStoreProvider ○ contains methods to get given store and execute maintenance task (snapshotting , purging, deleting files, cleaning old states). ● StateStoreCoordinator ○ ensures task for a partition gets scheduled on an executor where its last versioned state is maintained in hashmap.
  • 14.
    00Copyright 2018 ©Qubole Code Flow of Stateful Structured Streaming
  • 15.
    00Copyright 2018 ©Qubole Quiz Time Possible Issues with the HDFS backed implementation in production ?
  • 16.
    00Copyright 2018 ©Qubole Quiz Time Possible Issues with the HDFS backed implementation in production ? ● State is constrained by executor memory ● Same executor memory to be shared with RDD computation ● Single Daemon thread responsible snapshotting entire state hashmaps, file cleanings, etc
  • 17.
    00Copyright 2018 ©Qubole In-Memory HashMap Possible Solution ? Food for Thought
  • 18.
    00Copyright 2018 ©Qubole Embedded/Local Store : ● Key-Value embedded data store ● Improvised LevelDB open sourced by Facebook ● Bring Database close to Processing ● Pros : ○ No Memory Issues (HashMap) ○ No Network Latency (Cassandra) ○ Fast writes : Buffer + Sequential Transaction Log ○ Isolation ● Cons ○ Not Distributed ○ Not Replicated ○ Overhead of maintenance, non-JVM memory ● Architecture ○ Memtable : in-memory buffer ○ Change Log ○ SST Table on disk Image Source: google
  • 19.
    00Copyright 2018 ©Qubole in Streaming Systems ● Apache Flink https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html ● Apache Samza https://samza.apache.org/learn/documentation/0.7.0/container/state-management.html ● Kafka Streams https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Internal+Data+Mana gement
  • 20.
    00Copyright 2018 ©Qubole Summary ● What is Stateful Processing and State in Streaming ● Architecture of State Management in Stateful processing of Structured Streaming ● Code Example ● Why Embedded Store like RocksDB is so important in Stream Processing
  • 21.
    Thank You. Questions? QuboleBlog : https://www.qubole.com/blog/

Editor's Notes

  • #2 How many of you have idea about streaming, Worked on any streaming, understand the word “state management” ? …...should be useful for everyone of you. information about past input and can be used to influence the processing of future input, will see in detail Feel free to ask questions at any point of time during presentation
  • #3 Why you would like to listen this ? Although the talk is specific to Spark Structured Streaming, but the design, architecture, concepts and thought process behind why its there what its there will give you good understanding of any Streaming technology. All are like distant cousins of same family and you will see many overlaps between different streaming systems. Understanding one helps you to understand others. Many of them copy or say are inspired from each other. Will give you persepective of streaming engine developer
  • #4 *Quick question: What do you infer from this picture ?
  • #5 *pretty much sums up difference between batch and stream processing Batch is data at rest, you take chunk of data each time you process. In streaming you keep getting data and you need to process it as and when the data comes
  • #7 We will see running version of this example on Qubole Notebook after understanding State Management START THE CLUSTER Objective of showing this code example is to give you idea of stateful processing, so when we talk about state management , you can actually relate and understand easily
  • #8 Having given some rough idea about structured streaming, Lets start with the actual topic that we want to discuss today By analogy to SQL, the select and where clauses of a query are usually stateless, but join, group by and aggregation functions like sum and count require state.
  • #9 Intermediate information in stream processing State of progress: offsets/commits
  • #11  Often easy to understand when compared with predecessor, evolution is constant process, something new comes because of limitations of old Story about experience with Stateless stream processing, maintaining offsets in zookeeper
  • #12 This is the main meat of this talk that I want to go into detail
  • #13 Prepared diagram on my understanding of the internal code, how it works in upcoming Spark 2.4 It is very important to note here is that all these concepts like incremental checkpointing, asynchronous state management are not specific to Spark Streaming. Will find in other streaming systems like Flink,etc also with different names.
  • #14 Slide for guys interested in checking out code theirselves classes/interfaces/method involved in doing the State management Wont go in detail, instead will show the code flow of the state management in next slide
  • #15 Stateful operator is the place where logic to interact with state store resides. Show code
  • #16 Before I go forward, do you have any questions here Because now I have a question for you
  • #17 Do u see any possible issues with this architecture Honestly I have not encountered any issues but lets discuss what can be possible issues with this approach
  • #18 Go back to architecture diagram
  • #19 Had intentionally not talked about RocksDB at the starting, now is the time Really wanted to talk about this embedded storage or local persistent store
  • #20 Why Embedded Storage? Became famous because of Flash Memory era/ SSDs , writing to local disks became much faster compared to client-server model over network to storage systems. Sequential read/write : analogy of airport conveyor belt for spinning disks, latency involved in doing the rotation and seek time going to right sector of the data Hadoop was about moving processing closer to data, RocksDb is about moving database closer to processing. Improvised LevelDB : multithreaded write and compaction, support for bloom scans while reading data, improved compaction logic similar to HBase
  • #21 rocksDB is present in almost every latest streaming systems with need of keeping unlimited state without penalty of network call Storm : currently does not use local storage like rocksDb. It still relies on remote storages like redis,HBase,cassandra. Samza : features in LinkedIn like personalized feed to be sent to your wall is decided after joining lot of information with the available feed using Samza Kafka and Samza were written by same people in LinkedIn who later went on to found company called Confluent where they wrote kafka Streams. So you will find many similarities.
  • #22 Like said in the beginning, understanding one system will help us understand others. RocksDB understanding is one of them . Incremental checkpointing, snapshotting, Asynchronous state management are other concepts Technologies might be different, implementations might be different but after all they are trying to similar problem of distributed world which have same challenges, limitations and expectations like fault tolerance,exactly once processing,etc will be there everywhere
  • #23  Please have a close watch on Qubole Engineering. We write lot of interesting stuffs on Big data on cloud, Spark , open sourced SparkLens, Tuning, Hive , Presto, AWS,