One of the biggest challenges for today’s microservice generation is data, which gets split into fragments that are spread across a company, making it hard to get a joined-up view. One solution is to have a single, shared database that all services can access, but sharing databases across different services is a well-known anti-pattern. What if instead you shared a replayable commit log? This is the basic notion behind one of the most interesting and provocative ideas to arise from the stream-processing community.
Ben Stopford explains how an event stream—stored in a replayable log—can be used as a source of truth, incorporating the retentive properties of a database in a system designed to share data across many teams, cloud providers, or geographies. This leads to the idea of a database turned inside out: a central commit log spawning many continuously updated caches and views, embedded in different microservices. Ben examines the subtler, systemic effects that the pattern leads to—better autonomy, easier evolution and a more ephemeral approach to data—and explores the use of logs that span geographical regions and cloud providers. Along the way, he reflects on the practicalities of using logs as a distributed storage system and looks at some of the real-world applications of this approach.
7. Event Storage
Kafka stores
petabytes of data
Stream Processing
Real-time processing
over streams and tables
Scalability
Clusters of hundreds
of machines. Global.
+ + +
Messaging + …
9. Formulae 1 – Race Telemetry
• 400 Sensors on car
• 70,000 derivative
measures
• Events streamed back to
base
• Analyzed in real time
• Tire modelling
• Racing line
• Aerodynamics
• Machine Learning and
Physics Models.
• Replayed later for post
race analysis.
Race Track HQ
e.g. Tire modelling:
- Temp
- Pressure
- Suspension compression
Stream Processing
15. A Shopping Cart as Events
Shopping Cart Events
2 Trousers added
1 Jumper added
1 Trousers removed
1 Hat added
Checkout
Shopping Cart
16. Traditional Event Sourcing
(Store raw events in a database in time order)
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
Journal of every state change
Save State Changes as Events
Apps
Search Monitoring
Apps Apps
17. Traditional Event Sourcing
(Derive current state from truthful events)
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
Save State Changes as Events
Apps
Search Monitoring
Apps Apps
Apply Projection
Query by
customer Id
- Projection applied on read
- Constantly rederived from
truthful events
- No schema migration
18. Using Kafka: A Distributed Log
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
All events, stored indefinitely
19. Using Kafka: Log, but no query
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
Can’t query by
CustomerId
CustomerId
CustomerId
CustomerId
CustomerId
20. CQRS with Kafka
Using events to build a view (DB, Cache, Stream Processor)
Apps
Search Mo
Apps Apps
S T R E A M I N G P L AT F O R M
Projection
(Stream Processor)
Query by customer Id
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
DWH
Hadoop
S T R E A M I N G P L AT F O R M
View
Events
Events accumulate in the log
- Event stream is source of truth
- View can be a DB, Cache or
Stateful Stream Processor
- View can be re-derived from
the event stream
http://bit.ly/kafka-microservice-examples
22. New York Times
Source of Truth
Every article since
1851
https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/
Normalized assets
(images, articles, bylines, tags
all separate messages)
Denormalized into
“Content View”
24. Alternate Approach: “Write Through”
(Event model in DB, CDC Connector)
Apps
Monitoring
Apps Apps
S T R E A M I N G P L AT F O R M
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
DWH
Hadoop
S T R E A M I N G P L AT F O R M
Write
Query
Every write becomes an event
Note:
- Database is now the source
of truth.
- Events are a “cache”
available to others.
- Users can read their writes
immediately (not true of
CQRS)
COMMON
IN PRACTICE
26. We can repurpose the event stream
Apps
Search
NoSQL
M
S
Apps Apps
S T R E A M I N G P L AT F O R M
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
DWH
Hadoop
S T R E A M I N G P L AT F O R M
View
Shipping Service
Source of Truth
Full-text Search
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
View
27. Join datasets from many different sources in real-time
Fraud Service
Orders
Service
Payment
Service
Customer
Service
Event Log
Projection created in
Kafka Streams API
28. Create Aggregate Streams
(easier to consume, keep apps stateless)
Orders
Service
Payment
Service
Customer
Service
Aggregate Events
Apps
Search
Apps
Apps
Search
NoSQL
Apps
DWH
S T R E A M I N G P L AT F O
NoSQL
Order
Payment
Customer
29. - Historical and real-time data are
both self service (pluggable)
- Source systems don’t need to
republish
- Views are use case specific /
decoupled / autonomous
- Encourages Event-driven design
Billin
g
Shipping
Fraud Fraud
Fulfilment
The Source of Truth
Services flex around a central source of truth
Many views
derived from
the log
Apps
M
onitorin
Apps
Apps
Hadoop
R
E
A
M
IN
G
P
L
A
T
F
O
R
M
a.k.a. Forward Deployed Event Cache, The Database Inside Out
Event driven
services
Apps
Search
NoSQL
Apps
Apps
DW
H
S
T
R
EA
M
IN
G
P
LA
T
FO
R
M
36. Separate stateful and stateless operations
(Just like you do with a database)
KSQL
Stateful
Data Layer
Stateless
Application layer
Business logic
goes here
Source of Truth
37. For the hip and trendy, use FaaS
KSQL
Stateless
FaaS
FaaS
FaaS
FaaS
Autoscale
Stateful
Data Layer
39. Writes are typically the limiting factor
Kafka Streams:
• RocksDB: capable of ~10M x 500 KB objects per minute on top end
hardware (roughly GbE speed)
Regular database:
• Postgres will bulk-load ~1M rows per minute.
(Kafka delivers data at ~network speed)
40. Lean Data – take only the data you need
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
DWH
Hadoop
S T R E A M I N G P L AT F O R M
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
Apps
Search Monitoring
Apps Apps
S T R E A M I N G P L AT F O R M
Apps
Search
Apps
Apps
Search
NoSQL
Apps A
DWH
Hadoop
S T R E A M I N G P L AT F O R M
If messaging remembers, databases don’t have to.
SELECT O.OrderId, C.Email
FROM ORDERS O, CUSTOMERS C
WHERE O.REGION = ‘EU’
AND C.TYPE = ‘Platinum’
42. It’s ok to Store Data in Kafka
• Largely built by a guy who built databases (DB2…)
• Log files are immutable once they roll
• (unless it’s a compacted topic)
• Log is O(1) read, O(1) write
• But care required: Writes can block behind historical
scans
• Some users run dedicated clusters for reading old data
• ZFS has several page cache optimizations
• Tiered storage would help
49. In summary
• Broadcast events.
• Cache shared datasets in the log and make them discoverable.
• Let users manipulate event streams directly.
• Drive simple microservices, or prepare use case specific views in
a DB of your choice.
51. Thank you
@benstopford
Microservices blog with associated code
http://bit.ly/kafka-microservice-examples
Book:
https://www.confluent.io/designing-event-driven-systems