5. PRESENTED BY
Breaking up the Solution into Functional Blocks
Click data
Record all clicks Count clicks in real-time Query clicks by assets
2. Data Processing1. Data Ingest 3. Data Querying
6. PRESENTED BY
ClickAnalyzer
Redis Stream Redis Hash Spark SQLStructured Stream Processing
1. Data Ingest 2. Data Processing 3. Data Querying
The Actual Building Blocks of Our Solution
Click data
8. PRESENTED BY
ClickAnalyzer
Redis Stream Redis Hash Spark SQLStructured Stream Processing
1. Data Ingest 2. Data Processing 3. Data Querying
Data Ingest using Redis Streams
11. PRESENTED BY
Redis Streams can Connect Many Producers and Consumers
Producer 2
Producer m
Producer 1
Producer 3
Consumer 1
Consumer n
Consumer 2
Consumer 3
12. PRESENTED BY
Comparing Redis Streams with Redis Pub/Sub, Lists, Sorted Sets
Pub/Sub
• Fire and forget
• No persistence
• No lookback queries
Lists
• Tight coupling between
producers and consumers
• Persistence for transient
data only
• No lookback queries
Sorted Sets
• Data ordering isn’t built-in;
producer controls the order
• No maximum limit
• The data structure is not
designed to handle data
streams
13. PRESENTED BY
What is Redis Streams?
Pub/Sub Lists Sorted Sets
It is like Pub/Sub, but
with persistence
It is like Lists, but decouples
producers and consumers
It is like Sorted Sets, but
asynchronous
+
• Lifecycle management of streaming data
• Built-in support for timeseries data
• A rich choice of options to the consumers to read streaming and static data
• Super fast lookback queries powered by radix trees
• Automatic eviction of data based on the upper limit
14. PRESENTED BY
Redis Streams Benefits
Analytics
Data Backup
Consumers
Producer
Messaging
It enables asynchronous data exchange between producers and consumers
and historical range queries
15. PRESENTED BY
Redis Streams Benefits
Producer
Image Processor
Arrival Rate: 500/sec
Consumption Rate: 500/sec
Image Processor
Image Processor
Image Processor
Image Processor
Redis Stream
With consumer groups, you can scale out and avoid backlogs
PRESENTED BY
Redis Streams Benefits
Producer
Image Processor
Arrival Rate: 500/sec
Consumption Rate: 500/sec
Image Processor
Image Processor
Image Processor
Image Processor
Redis Stream
With consumer groups, you can scale out and avoid backlogs
16. PRESENTED BY
Classifier 1
Classifier 2
Classifier n
Consumer Group
XREADGROUP
XREAD
Consumers
Producer 2
Producer m
Producer 1
Producer 3
XADD
XACK
Deep Learning-based
Classification
Analytics
Data Backup
Messaging
Redis Streams Benefits
Simplify data collection, processing and
distribution to support complex scenarios
19. PRESENTED BY
ClickAnalyzer
Redis Stream Redis Hash Spark SQLStructured Stream Processing
1. Data Ingest 2. Data Processing 3. Data Querying
Data Processing using Spark’s Structured Streaming
21. PRESENTED BY
“Structured Streaming provides fast, scalable, fault-
tolerant, end-to-end exactly-once stream processing
without the user having to reason about streaming.”
Definition
23. PRESENTED BY
ClickAnalyzer
Redis Stream Redis HashStructured Stream Processing
Redis Streams as data source
Spark-Redis Library
Redis as data sink
§ Developed using Scala
§ Compatible with Spark 2.3 and higher
§ Supports
• RDD
• DataFrames
• Structured Streaming
24. PRESENTED BY
1. Connect to the Redis instance
2. Map Redis Stream to Structured Streaming schema
3. Create the query object
4. Run the query
Redis Streams as Data Source
25. PRESENTED BY
Redis Streams as Data Source
1. Connect to the Redis instance
val spark = SparkSession.builder()
.appName("redis-df")
.master("local[*]")
.config("spark.redis.host", "localhost")
.config("spark.redis.port", "6379")
.getOrCreate()
val clickstream = spark.readStream
.format("redis")
.option("stream.keys","clickstream")
.schema(StructType(Array(
StructField("img", StringType)
)))
.load()
val queryByImg = clickstream.groupBy("img").count
26. PRESENTED BY
Redis Streams as Data Source
2. Map Redis Stream to Structured Streaming schema
val spark = SparkSession.builder()
.appName("redis-df")
.master("local[*]")
.config("spark.redis.host", "localhost")
.config("spark.redis.port", "6379")
.getOrCreate()
val clickstream = spark.readStream
.format("redis")
.option("stream.keys","clickstream")
.schema(StructType(Array(
StructField("img", StringType)
)))
.load()
val queryByImg = clickstream.groupBy("img").count
xadd clickstream * img [image_id]
27. PRESENTED BY
Redis Streams as Data Source
3. Create the query object
val spark = SparkSession.builder()
.appName("redis-df")
.master("local[*]")
.config("spark.redis.host", "localhost")
.config("spark.redis.port", "6379")
.getOrCreate()
val clickstream = spark.readStream
.format("redis")
.option("stream.keys","clickstream")
.schema(StructType(Array(
StructField("img", StringType)
)))
.load()
val queryByImg = clickstream.groupBy("img").count
28. PRESENTED BY
Redis Streams as Data Source
4. Run the query
val clickstream = spark.readStream
.format("redis")
.option("stream.keys","clickstream")
.schema(StructType(Array(
StructField("img", StringType)
)))
.load()
val queryByImg = clickstream.groupBy("img").count
val clickWriter: ClickForeachWriter = new ClickForeachWriter("localhost","6379")
val query = queryByImg.writeStream
.outputMode("update")
.foreach(clickWriter)
.start()
query.awaitTermination()
Custom output sink
29. PRESENTED BY
Redis as Output Sink
override def process(record: Row) = {
var img = record.getString(0);
var count = record.getLong(1);
if(jedis == null){
connect()
}
jedis.hset("clicks:"+img, "img", img)
jedis.hset("clicks:"+img, "count", count.toString)
}
Create a custom class extending ForeachWriter and override the method, process()
Save as Hash with structure
clicks:[image]
img [image]
count [count]
Example
clicks:image_1001
img image_1001
count 1029
clicks:image_1002
img image_1002
count 392
.
.
.
.
img count
image_1001 1029
image_1002 392
. .
. .
Table: Clicks
37. PRESENTED BY
ClickAnalyzer
Redis Stream Redis Hash Spark SQLStructured Stream Processing
1. Data Ingest 2. Data Processing 3. Data Querying
Building Blocks of our Solution
Redis Streams as data source; Redis as data sinkSpark-Redis Library is used for:
41. PRESENTED BY
1 Problem statement and the demo
2 Building blocks of the solution:
• Data ingest: Redis Streams
• Data processing: Structured Streaming + Spark-Redis
• Data querying: Spark SQL + Redis
3 Recap
Agenda