A Global Source of Truth for the Microservices Generation

A Global Source of Truth for the
Microservices Generation
Ben Stopford
Office of the CTO
Confluent
@benstopford

Where does the data live?
In the Events

Trade
Surveillance
Project
• 9 months sourcing 16 data sets
• Different formats (including for historical extracts)
• Batch based approach

Event Streams
Orders
Payments
Customers
Distinct Visits
Destination
Elasticsearch
Postgres
AWS Lambda
Other Kafka
Select Organizational Events
Stream Processing
SELECT *
FROM ORDERS O, CUSTOMERS C
WHERE O.REGION = ‘EU’
AND C.TYPE = ‘Platinum’
Msgs/Day
Customers
Stream Processing
Elastic
Lambda
Orders
History
1w
All

Event-driven designs are (mostly) location independent

Apps Apps Apps
Apps
Search Monitoring
Apps Apps
Apps Apps Apps
Apps
Search Monitoring
Apps Apps
Apps
Search
NoSQL
Apps
Apps
DWH
Hado
STREAM
ING
PLATFORM
Apps
Search
NoSQL
Apps
DWH
STREAMING
PLATFORM
PRODUCERCONSUMER
Streaming Platform

Event Storage
Kafka stores
petabytes of data
Stream Processing
Real-time processing
over streams and tables
Scalability
Clusters of hundreds
of machines. Global.
+ + +
Messaging + …

Formulae 1 – Race Telemetry
• 400 Sensors on car
• 70,000 derivative
measures
• Events streamed back to
base
• Analyzed in real time
• Tire modelling
• Racing line
• Aerodynamics
• Machine Learning and
Physics Models.
• Replayed later for post
race analysis.
Race Track HQ
e.g. Tire modelling:
- Temp
- Pressure
- Suspension compression
Stream Processing

This is a form of
Event Sourcing
We can apply this idea to any application

In Event Sourcing events are
immutable, stateless and
truthful.

A Shopping Cart as Events
Shopping Cart Events
2 Trousers added
1 Jumper added
1 Trousers removed
1 Hat added
Checkout
Shopping Cart

Traditional Event Sourcing
(Store raw events in a database in time order)
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
Journal of every state change
Save State Changes as Events
Apps
Search Monitoring
Apps Apps

Traditional Event Sourcing
(Derive current state from truthful events)
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
Save State Changes as Events
Apps
Search Monitoring
Apps Apps
Apply Projection
Query by
customer Id
- Projection applied on read
- Constantly rederived from
truthful events
- No schema migration

Using Kafka: A Distributed Log
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
All events, stored indefinitely

Using Kafka: Log, but no query
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
Can’t query by
CustomerId
CustomerId
CustomerId
CustomerId
CustomerId

CQRS with Kafka
Using events to build a view (DB, Cache, Stream Processor)
Apps
Search Mo
Apps Apps
Projection
(Stream Processor)
Query by customer Id
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
DWH
Hadoop
View
Events
Events accumulate in the log
- Event stream is source of truth
- View can be a DB, Cache or
Stateful Stream Processor
- View can be re-derived from
the event stream
http://bit.ly/kafka-microservice-examples

New York Times
Source of Truth
Every article since
1851
https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/
Normalized assets
(images, articles, bylines, tags
all separate messages)
Denormalized into
“Content View”

What do I do if I already
have a database?

Alternate Approach: “Write Through”
(Event model in DB, CDC Connector)
Apps
Monitoring
Apps Apps
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
DWH
Hadoop
Write
Query
Every write becomes an event
Note:
- Database is now the source
of truth.
- Events are a “cache”
available to others.
- Users can read their writes
immediately (not true of
CQRS)
COMMON
IN PRACTICE

We can repurpose the event stream
Apps
Search
NoSQL
M
S
Apps Apps
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
DWH
Hadoop
View
Shipping Service
Source of Truth
Full-text Search
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
View

Join datasets from many different sources in real-time
Fraud Service
Orders
Service
Payment
Service
Customer
Service
Event Log
Projection created in
Kafka Streams API

Create Aggregate Streams
(easier to consume, keep apps stateless)
Orders
Service
Payment
Service
Customer
Service
Aggregate Events
Apps
Search
Apps
Apps
Search
NoSQL
Apps
DWH
S T R E A M I N G P L AT F O
NoSQL
Order
Payment
Customer

- Historical and real-time data are
both self service (pluggable)
- Source systems don’t need to
republish
- Views are use case specific /
decoupled / autonomous
- Encourages Event-driven design
Billin
g
Shipping
Fraud Fraud
Fulfilment
The Source of Truth
Services flex around a central source of truth
Many views
derived from
the log
Apps
M
onitorin
Apps
Apps
Hadoop
R
E
A
M
IN
G
P
L
A
T
F
O
R
M
a.k.a. Forward Deployed Event Cache, The Database Inside Out
Event driven
services
Apps
Search
NoSQL
Apps
Apps
DW
H
S
T
R
EA
M
IN
G
P
LA
T
FO
R
M

All patterns involve trade offs

Do I need to store
events in a messaging
system?

Global
Deployment
Multi-Team
Cluster
Automated data
provisioning.
Cached
Datasets &
Streaming Apps
5
4
3
2
Investment & time
Single Team,
Microservices /
Streaming Analytics
1
Value
It’s a pattern, adopt it when you’re ready

Stateful Stream Processing requires storage
Transaction
Payments
KStreams
Customers Table
(Read Only)
Intermediary State
(Read/Write)
Orders
Event Storage

Start with Dimensions
Facts
(Streams)
Dimensions
(Tables)
Orders
Visits
Payments
CustomersAccountsProducts
Large, High
Velocity.
Small, Low
Velocity.
Dimensions typically
only useful as a whole
dataset

Stateful Stream
Processing is Stateful.
Aren’t stateful
applications bad?

Separate stateful and stateless operations
(Just like you do with a database)
KSQL
Stateful
Data Layer
Stateless
Application layer
Business logic
goes here
Source of Truth

For the hip and trendy, use FaaS
KSQL
Stateless
FaaS
FaaS
FaaS
FaaS
Autoscale
Stateful
Data Layer

Won’t reloading events
and applying projections
be slow?

Writes are typically the limiting factor
Kafka Streams:
• RocksDB: capable of ~10M x 500 KB objects per minute on top end
hardware (roughly GbE speed)
Regular database:
• Postgres will bulk-load ~1M rows per minute.
(Kafka delivers data at ~network speed)

Lean Data – take only the data you need
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
DWH
Hadoop
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
Apps
Search Monitoring
Apps Apps
Apps
Search
Apps
Apps
Search
NoSQL
Apps A
DWH
Hadoop
If messaging remembers, databases don’t have to.
SELECT O.OrderId, C.Email
FROM ORDERS O, CUSTOMERS C
WHERE O.REGION = ‘EU’
AND C.TYPE = ‘Platinum’

Is Kafka built for long
term storage?

It’s ok to Store Data in Kafka
• Largely built by a guy who built databases (DB2…)
• Log files are immutable once they roll
• (unless it’s a compacted topic)
• Log is O(1) read, O(1) write
• But care required: Writes can block behind historical
scans
• Some users run dedicated clusters for reading old data
• ZFS has several page cache optimizations
• Tiered storage would help

Anonymize with a Stream Processor
Anonymized events
Anonymization metadata

Delete messages by key with a compacted topic
https://www.confluent.io/blog/handling-gdpr-log-forget/

Events are immutable, stateless
and truthful.

Events as a Global Source of Truth

In summary
• Broadcast events.
• Cache shared datasets in the log and make them discoverable.
• Let users manipulate event streams directly.
• Drive simple microservices, or prepare use case specific views in
a DB of your choice.

Self-service data, wherever you are,
in whatever form you need, at whatever scale.

Thank you
@benstopford
Microservices blog with associated code
http://bit.ly/kafka-microservice-examples
Book:
https://www.confluent.io/designing-event-driven-systems

A Global Source of Truth for the Microservices Generation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A Global Source of Truth for the Microservices Generation

Similar to A Global Source of Truth for the Microservices Generation (20)

More from Ben Stopford

More from Ben Stopford (20)

Recently uploaded

Recently uploaded (20)

A Global Source of Truth for the Microservices Generation