Real-World Pulsar Architectural Patterns

Real-World Pulsar Architectural Patterns
Every pattern shown here has been developed and implemented with my
team at Overstock
Email: dbost@overstock.com
Twitter: DevinBost
LinkedIn: https://www.linkedin.com/in/devinbost/
By Devin Bost, Senior Data Engineer at Overstock
Distributed Caching + Distributed Tracing
+

Missing Stephen Bourke in Ireland and Sam Lowen, our new product manager.

Disclaimer: I’m assuming you
already know that Apache Pulsar is
the future of real-time messaging

= Pulsar topic
= Kafka topic
= Apache Bookkeeper
Tech Legend

Producer
Consumer
Alerts to end users
(e.g. Email, SMS, Twilio call, etc.)
Passthrough function
/ingest
/feeds

Diagram courtesy of
Thor Sigurjonsson

Producer Producer Producer
Consumer Consumer Consumer
Alerts to end users
Higher-availability:
/ingest
/feeds

Is this a safe approach to caching?
/ingest
/feeds

What happens if Redis goes down?
X
/ingest
/feeds

It’s much safer to use a
distributed cache
technology like Ignite
/ingest
/feeds

distributed cache
Smart Persistence
/ingest
/feeds

distributed cache
Smart Persistence
Faster than Redis
/ingest
/feeds

distributed cache
Smart Persistence
Faster than Redis
Supports tables with backing cache
/ingest
/feeds

distributed cache
Smart Persistence
Faster than Redis
Supports tables with backing cache
Supports transactions
/ingest
/feeds

What if you have a business-
critical service that can’t lose
messages?

“But, doesn’t Pulsar already
guarantee that messages
won’t be lost?”

There’s a difference between
Pulsar losing messages and your
application losing messages!

So, we introduce a backfill path.

Notice I avoided putting a SQL
database on the list
Replication
Persistent storage
Alerts to end users
Batch job
Backﬁll Topic
Alerts to end users
OR
Etc
Message
delivered yet?
Message
delivered yet?
Kappa + Lambda architectures
/ingest
/feeds

Replication
Persistent storage
Pulsar Function
Alerts to end users
Alerts to end users
Batch job OR
Etc
Backﬁll Topic
Pulsar Function
Message
delivered yet?
Message
delivered yet?
You could add
another passthrough
function and topic if
you want more
isolation.
/ingest
/feeds

How about just for ingesting
data into a cache with a
backfill?

Option 1:
Web Service
Passthrough
Function
Persistent Append
Only Storage
Cache Sink
Batch Engine
(e.g. Spark, NiFi, etc.)
Read All Data
Loads into
existing
topic
OR
Starts Job
Etc
Replication
/ingest
/feeds

Option 2:
Achieves separation of concerns and
prevents QoS problems with live
traffic when running a backfill
Web Service
Passthrough
Function
Cache Sink
Batch Engine
(e.g. Spark, NiFi, etc.)
Loads into
backﬁll topic OR
Starts Job
Cache Sink
Persistent Append
Only Storage
Read All Data
Etc
Replication
/ingest
/feeds

Topic with Retention
Function
Cache Sink
Function
(stopped until
needing to backﬁll)
Exclusive
Mode
(Subscription stores in
Bookkeeper
automatically.)
Tiered Storage in
S3 or Google
Cloud
Backﬁll Cache Sink
Starts Function
Passthrough
FunctionOption 3:
Note: You need to ensure the
Bookkeeper cluster is fast enough
to keep up with the brokers or
your brokers’ memory will fill up
Also, this approach will only give
you a single backfill run unless
you have additional replication.
/feeds

What about adding caching to
a legacy application?

Legacy
SQL DB
Website
Enrich from cached data
Extract Relevant Data
Filter to desired clicks
(Raw Clicks)
Store in cache
Web Application
Emitting speciﬁc events
(Omitting
passthrough
details for
simplicity.)

Legacy
SQL DB
Website
Store in cache
Web Application
You can also emit directly to Pulsar as a producer.
It’s simpler if you have the ability to touch the website code.

Legacy
SQL DB
Website
Store in cache
Web Application
You can also emit directly to Pulsar as a producer.
It’s simpler if you have the ability to touch the website code.
However, the raw clicks flow still gets messy.

Legacy
SQL DB
Website
Store in cache
Web Application
Event A
Event B
Event C
Cleaner & better separation of concerns to have
purposeful topics... Easier to debug & maintain.

What if you’re using a graph engine
for more complex query logic but
need that data in real-time?

If you don’t make it synchronous, you
will get race conditions when updating
and querying the graph!
Web Application
Emitting speciﬁc deltas
(e.g. state change, increment, etc.)
(2) Return on completion
(1) Write Change, Wait for Success
Get full record
Complex
graph query
Synchronous Update
Function

What if you need a more robust verification that deltas are in
order and aren’t duplicates?
(e.g. financially impacting increment/decrement values or
state variables)

Usually, it’s best to separate concerns into separate functions, like this.
Web Application
Get full record
Complex
graph query
Synchronous Update
Function
Gate Keeper Filter
Check if
message has
been seen
already
Duplicate
or late?
Drop messageYes
No
But, is that the right approach here?

What happens when you turn
up parallelism on the
functions?

Web Application
Get full record
Complex
graph query
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Gate Keeper Filter
Gate Keeper Filter
Gate Keeper Filter
Gate Keeper Filter
Gate Keeper Filter
Gate Keeper Filter

In this case, the right approach is to
consolidate your logic to leverage the
transactional guarantees of your
database.
Web Application
(1) Check if
duplicate or
outdated.
If not, write
Change, Wait for
Success
Get full record
Complex
graph query
Synchronous Update
Function

Leverage transactional guarantees of
your database! (Your function will
need to retry if a transaction fails.)
Web Application
Get full record
Complex
graph query
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function

Web Application
Get full record
Complex
graph query
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Always be mindful of how the behavior
might change when function
parallelism is turned up!

Web Application
Get full record
Complex
graph query
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Always be mindful of how the behavior
might change when function
parallelism is turned up!
If making state changes,
be sure that you get
timestamps on your
upstream data contract
so you can verify if the
messages are in order!

Now, what happens when you
need to debug a large or
complex function flow?

FunctionFunction
Function
Web Application Function
Web Service
Web Application
Function
Function
Web Service Web Application
FunctionFunction
Function
Function
Function
Function

FunctionFunction
Function
Web Service
Web Application
Function
Function
FunctionFunction
Function
Function
Function
Function
What happens if some
messages seem to not be
reaching their destination?

FunctionFunction
Function
Web Service
Web Application
Function
Function
FunctionFunction
Function
Function
Function
Function
What happens if a message
isn’t getting transformed
correctly at some point or
null values are appearing?

FunctionFunction
Function
Web Service
Web Application
Function
Function
FunctionFunction
Function
Function
Function
Function
What if we can’t modify the
function code (since it’s a
multi-tenant application)?

FunctionFunction
Function
Web Service
Web Application
Function
Function
FunctionFunction
Function
Function
Function
Function
What if we can’t modify
the data contracts either
for the same reason?
What if we can’t modify the
function code (since it’s a
multi-tenant application)?

Function Function Function Function
Span 1
Span 2
Span 3
Span 4
Span 1
Trace

Jaeger is based on the OpenTracing standard
Check out the book, “Mastering Distributed Tracing” (2019) by Yuri Shkuro
Trace
Span Span Span
Span Span Span Span Span Span Span SpanSpan
Tags (Key/Value)
Logs (Key/Value + Timestamp)
Tags (Key/Value)
Tags (Key/Value)
. . .Span Span
Tags (Key/Value)

4
5
7
3
Count By Key
e.g. Over a session-based window

Metadata
parameters for the
TapFunction’s
envelope are
specified in its
config
Message1
Message2
Message3
Message1’
Message2’
Message3’
Message1’’
Message2’’
Message3’’
. . .Function1 Function1
JoinerFunction
Message2 Message2’ Message2’’
CorrelationId=productId-784 Message1 Message1’ Message1’’
CorrelationId=productId-142
Message3 Message3’ Message3’’CorrelationId=productId-923
Jaeger Sink
Taps wrap
message with
header containing:
CorrelationId,
Tenant,
Namespace,
Name,
timestamp,
etc.
TapFunction TapFunction TapFunction
SamplerFunction
StartTimestamp
EndTimestamp
Span
StartTimestamp
EndTimestamp
Span
Message1
StartTimestamp
EndTimestamp
Span
Message1’
StartTimestamp
EndTimestamp
Span
Message1’’Message1’ Message1’’ Message1’’’

CorrelationId is
derived and put into
the envelope
produced by the
TapFunction.
{ "correlationKey": "productId",
"correlationValue": "20603199",
"correlationId": "productId-20603199”,
. . .
}
The TapFunction
defines the
correlationKey in
its Pulsar Config.
You can tap ANY topic!
Message1
Message2
Message3
Message1’
Message2’
Message3’
Message1’’
Message2’’
Message3’’
. . .Function1 Function1
Taps wrap
message with
header containing:
CorrelationId,
Tenant,
Namespace,
Name,
timestamp,
etc.
TapFunction TapFunction TapFunction

Uses Flink’s stateful join capability. It all happens in a keyed stream!
JoinerFunction
Message2 Message2’ Message2’’
Message3 Message3’ Message3’’CorrelationId=productId-923

Allows us to specify rate to limit how
many messages are sampled. For dev,
this is set to 100% to allow all. This is just a simple Pulsar filter function.
SamplerFunction

These Spans emit to Jaeger and can be stored in a
Cassandra or Elasticsearch backend for production.
OR
Jaeger Sink
StartTimestamp
EndTimestamp
Span
StartTimestamp
EndTimestamp
Span
Message1
StartTimestamp
EndTimestamp
Span
Message1’
StartTimestamp
EndTimestamp
Span
Message1’’Message1’ Message1’’ Message1’’’

If fields are omitted
from the tap’s
config, we capture
all that we can.

Another trick is to provide alternative representations of a value to make search/analytics easier downstream.

Real-World Pulsar Architectural Patterns
Every pattern shown here has been developed and implemented with my
team at Overstock
Email: dbost@overstock.com
Twitter: DevinBost
LinkedIn: https://www.linkedin.com/in/devinbost/
By Devin Bost, Senior Data Engineer at Overstock
Distributed Caching + Distributed Tracing

Real-World Pulsar Architectural Patterns

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Real-World Pulsar Architectural Patterns

Similar to Real-World Pulsar Architectural Patterns (20)

More from Devin Bost

More from Devin Bost (6)

Recently uploaded

Recently uploaded (20)

Real-World Pulsar Architectural Patterns