The Data Dichotomy:
Rethinking Data and Services
with Streams
Ben Stopford
@benstopford
What are
microservices
really about?
GUI
UI
Service
Orders
Service
Returns
Service
Fulfilment
Service
Payment
Service
Stock
Service
Splitting the Monolith?
Single Process
/Code base
/Deployment
Many Processes
/Code bases
/Deployments
Orders
Service
Autonomy?
Orders
Service
Stock
Service
Email
Service
Fulfillment
Service
Independently
Deployable
Independently
Deployable
Independently
Deployable
Independently
Deployable
Independence is
where services get
their value
Allows Scaling
Scaling in terms of people
What happens when we grow?
Companies are inevitably a
collection of applications
They must work together to some degree
Interconnection is an
afterthought
FTP / Enterprise Messaging etc
Internal World
External world
External world
External World
External world
Service
Boundary
Services Force Us To Consider The External World
Internal World
External world
External world
External World
External world
Service
Boundary
External World is something we should Design For
But this
independence
comes at a cost
$$$
Orders
Object
Statement
ObjectgetOpenOrders(),
Less Surface Area = Less Coupling
Encapsulate
State
Encapsulate
Functionality
Consider Two Objects in one address space
Encapsulation => Loose Coupling
Change 1
Change 2
Redeploy
Singly-deployable
apps are easy
Orders Service Statement ServicegetOpenOrders(),
Independently Deployable Independently Deployable
Synchronized
changes are painful
Services work best
where requirements
are isolated in a single
bounded context
Single Sign On Business Serviceauthorise(),
It’s unlikely that a Business
Service would need the
internal SSO state /
function to change
Single Sign On
SSO has a tightly
bounded context
But business
services are
different
Catalog Authorisation
Most business services share the
same core stream of facts.
Most services
live in here
The futures of
business services
are far more
tightly intertwined.
We need encapsulation to hide
internal state. Be loosely coupled.
But we need the freedom to slice
& dice shared data like any other
dataset
But data systems
have little to do
with encapsulation
Service Database
Data on
inside
Data on
outside
Data on
inside
Data on
outside
Interface
hides data
Interface
amplifies
data
Databases amplify the data they hold
The data dichotomy
Data systems are about exposing data.
Services are about hiding it.
Service Database
Data on
inside
Data on
outside
Data on
inside
Data on
outside
Encapsulation
protects us
against future
change.
Databases
provide ultimate
flexibility for the
now
These approaches serve different goals
This affects
services in one of
two ways
getOpenOrders(
fulfilled=false,
deliveryLocation=CA,
orderValue=100,
operator=GreatherThan)
Either (1) we constantly add to the interface,
as datasets grow
getOrder(Id)
getOrder(UserId)
getAllOpenOrders()
getAllOrdersUnfulfilled(Id)
getAllOrders()
(1) Services can end up looking like
kookie home-grown databases
...and DATA amplifies
this “God Service”
problem
(2) Give up and move whole datasets
en masse
getAllOrders()
Data Copied
Internally
This leads to a
different problem
Data diverges over time
Orders
Service
Stock
Service
Email
Service
Fulfill-
ment
Service
The more
mutable copies,
the more data will
diverge over time
Nice neat
services
Service
contract too
limiting
Can we change
both services
together, easily?
NO
Broaden Contract
Eek $$$
NO
YES
Is it a
shared
Database?
Frack it!
Just give me
ALL the data
Data diverges.
(many different
versions of the same
facts)
Lets encapsulate
Lets centralise
to one copy
YES
Start
here
Cycle of inadequacy:
These forces compete in
the systems we build
Divergence
Accessibility
Encapsulation
Shared
database
Service
Interfaces
Better
Accessibility
Better
Independence
No perfect solution
Is there a better
way?
Data on the Inside
Data on the Outside
Data on the Outside
Data on the Outside
Data on the Outside
Service
Boundary
Make
data-on-the-outside
a 1st class citizen
Separate Reads &
Writes with
Immutable
Streams
Orders
Service
Stock
Service
Kafka
Fulfilment
Service
UI
Service
Fraud
Service
Event Driven + CQRS +
Stateful Stream Processing
Commands go to
owning services
Queries are interpreted
entirely inside in each
bounded context
Event Driven + CQRS +
Stateful Stream Processing
Orders
Service
Stock
Service
Orders
Fulfilment
Service
UI
Service
Fraud
Service
•  Events form a central Shared
Narrative
•  Writes/Commands => Go to “owning”
service
•  Reads/Queries => Wholly owned by
each Service
•  Stream processing toolkit
Kafka is a Streaming Platform
Kafka: a Streaming Platform
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
The Log
Kafka is a type of messaging system
Writers Kafka Readers
Important Properties:
•  Petabytes of data
•  Scales out linearly
•  Built in fault tolerance
•  Log can be replayed
Scaling becomes a
concern of the
broker, not the
service
Services Naturally Load
Balance
This provides Fault Tolerance
Reset to any point in the shared
narrative
Rewind & Replay
Compacted Log
(retains only latest version)
Version 3
Version 2
Version 1
Version 2
Version 1
Version 5
Version 4
Version 3
Version 2
Version 1
Service Backbone
Scalable, Fault Tolerant, Concurrent, Strongly Ordered, Retentive
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
A place to keep the
data-on-the-outside
as an immutable
narrative
Now add Stream
Processing
Kafka’s Streaming API
A general, embeddable streaming engine
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
What is Stream Processing?
A machine for combining and
processing streams of events
Features: similar to database query engine
Join Filter
Aggr-
egate
View
Window
Stateful Stream Processing adds
Tables & Local State
stream
Compacted
stream
Join
Stream Data
Stream-Tabular
Data
Domain Logic
Tables
/
Views
Kafka
Event Driven Service
Avg(o.time – p.time)
From orders, payment, user
Group by user.region
over 1 day window
emitting every second
Stateful Stream Processing
Streams
Stream Processing Engine
Derived “Table”
Table
Domain Logic
Tools make it easy to join streams
of events (or tables of data) from
different services
Email Service
Legacy App
Orders Payments Stock
KSTREAMS
Kafka
So we might join two event
streams and create a local
store in our own Domain Model
Orders
Service
Payment
Events
Purchase
Events
PurchasePayments
Make the projections queryable
Orders
Service
Kafka
Queryable State API
Context
Boundary
Blend the concepts of services
communicating and services sharing state
customer orders
catalogue
Pay-
ments
Very different to sharing a database
customer orders
catalogue
Pay-
ments
Services communicate and
share data via the log
Slicing & Dicing is embedded in each
bounded context
But sometimes you have
to move data
Easy to drop out to an
external database if needed
DB of choice
Kafka
Connect
APIFraud
Service
Context
Boundary
Oracle
Connector API & Open Source Ecosystem
make this easier
The Log
ConnectorConnector
Producer Consumer
Streaming Engine
Cassandra
WIRED Principals
•  Wimpy: Start simple, lightweight and fault tolerant.
•  Immutable: Build a retentive, shared narrative.
•  Reactive: Leverage Asynchronicity. Focus on the now.
•  Evolutionary: Use only the data you need today.
•  Decentralized: Receiver driven. Coordination avoiding. No
God services
So…
Architecture is really about
designing for change
The data dichotomy
Data systems are about exposing data.
Services are about hiding it.
Remember:
Shared data is a reality for
most organisations
Catalog
Embrace data that lives and flows
between services
Canonical Immutable Streams
Embed the ability to slice and dice into each
bounded context
Distributed
LogEvent Driven
Service
Shared
database
Service
Interfaces
Better
Accessibility
Better
Independence
Balance the Data Dichotomy
Create a canonical
shared narrative
Use stream processing
to handle data-on-the-
outside whilst prioritizing
the NOW
Event Driven Services
The Log
(streams & tables)
Ingestion
Services
Services with
Polyglotic
persistence
Simple Services
Streaming Services
Twitter:
@benstopford
Blog of this talk at
http://confluent.io/blog

Kafka Summit NYC 2017 - The Data Dichotomy: Rethinking Data and Services with Streams