Kafka Summit NYC 2017 - The Data Dichotomy: Rethinking Data and Services with Streams

The Data Dichotomy:
Rethinking Data and Services
with Streams
Ben Stopford
@benstopford

What are
microservices
really about?

GUI
UI
Service
Orders
Service
Returns
Service
Fulfilment
Service
Payment
Service
Stock
Service
Splitting the Monolith?
Single Process
/Code base
/Deployment
Many Processes
/Code bases
/Deployments

Orders
Service
Stock
Service
Email
Service
Fulfillment
Service
Independently
Deployable
Independently
Deployable
Independently
Deployable
Independently
Deployable

Independence is
where services get
their value

Scaling in terms of people
What happens when we grow?

Companies are inevitably a
collection of applications
They must work together to some degree

Interconnection is an
afterthought
FTP / Enterprise Messaging etc

Internal World
External world
External world
External World
External world
Service
Boundary
Services Force Us To Consider The External World

Internal World
External world
External world
External World
External world
Service
Boundary
External World is something we should Design For

But this
independence
comes at a cost
$$$

Orders
Object
Statement
ObjectgetOpenOrders(),
Less Surface Area = Less Coupling
Encapsulate
State
Encapsulate
Functionality
Consider Two Objects in one address space
Encapsulation => Loose Coupling

Change 1
Change 2
Redeploy
Singly-deployable
apps are easy

Orders Service Statement ServicegetOpenOrders(),
Independently Deployable Independently Deployable
Synchronized
changes are painful

Services work best
where requirements
are isolated in a single
bounded context

Single Sign On Business Serviceauthorise(),
It’s unlikely that a Business
Service would need the
internal SSO state /
function to change
Single Sign On

SSO has a tightly
bounded context

But business
services are
different

Catalog Authorisation
Most business services share the
same core stream of facts.
Most services
live in here

The futures of
business services
are far more
tightly intertwined.

We need encapsulation to hide
internal state. Be loosely coupled.
But we need the freedom to slice
& dice shared data like any other
dataset

But data systems
have little to do
with encapsulation

Service Database
Data on
inside
Data on
outside
Data on
inside
Data on
outside
Interface
hides data
Interface
amplifies
data
Databases amplify the data they hold

The data dichotomy
Data systems are about exposing data.
Services are about hiding it.

Service Database
Data on
inside
Data on
outside
Data on
inside
Data on
outside
Encapsulation
protects us
against future
change.
Databases
provide ultimate
flexibility for the
now
These approaches serve different goals

This affects
services in one of
two ways

getOpenOrders(
fulfilled=false,
deliveryLocation=CA,
orderValue=100,
operator=GreatherThan)
Either (1) we constantly add to the interface,
as datasets grow

getOrder(Id)
getOrder(UserId)
getAllOpenOrders()
getAllOrdersUnfulfilled(Id)
getAllOrders()
(1) Services can end up looking like
kookie home-grown databases

...and DATA amplifies
this “God Service”
problem

(2) Give up and move whole datasets
en masse
getAllOrders()
Data Copied
Internally

This leads to a
different problem

Data diverges over time
Orders
Service
Stock
Service
Email
Service
Fulfill-
ment
Service

The more
mutable copies,
the more data will
diverge over time

Nice neat
services
Service
contract too
limiting
Can we change
both services
together, easily?
NO
Broaden Contract
Eek $$$
NO
YES
Is it a
shared
Database?
Frack it!
Just give me
ALL the data
Data diverges.
(many different
versions of the same
facts)
Lets encapsulate
Lets centralise
to one copy
YES
Start
here
Cycle of inadequacy:

These forces compete in
the systems we build
Divergence
Accessibility
Encapsulation

Shared
database
Service
Interfaces
Better
Accessibility
Better
Independence
No perfect solution

Data on the Inside
Data on the Outside
Data on the Outside
Data on the Outside
Data on the Outside
Service
Boundary

Make
data-on-the-outside
a 1st class citizen

Separate Reads &
Writes with
Immutable
Streams

Orders
Service
Stock
Service
Kafka
Fulfilment
Service
UI
Service
Fraud
Service
Event Driven + CQRS +
Stateful Stream Processing
Commands go to
owning services
Queries are interpreted
entirely inside in each
bounded context

Event Driven + CQRS +
Orders
Service
Stock
Service
Orders
Fulfilment
Service
UI
Service
Fraud
Service
•  Events form a central Shared
Narrative
•  Writes/Commands => Go to “owning”
service
•  Reads/Queries => Wholly owned by
each Service
•  Stream processing toolkit

Kafka: a Streaming Platform
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine

Kafka is a type of messaging system
Writers Kafka Readers
Important Properties:
•  Petabytes of data
•  Scales out linearly
•  Built in fault tolerance
•  Log can be replayed

Scaling becomes a
concern of the
broker, not the
service

Services Naturally Load
Balance

Reset to any point in the shared
narrative
Rewind & Replay

Compacted Log
(retains only latest version)
Version 3
Version 2
Version 1
Version 2
Version 1
Version 5
Version 4
Version 3
Version 2
Version 1

Service Backbone
Scalable, Fault Tolerant, Concurrent, Strongly Ordered, Retentive
Producer Consumer
Streaming Engine

A place to keep the
data-on-the-outside
as an immutable
narrative

Kafka’s Streaming API
A general, embeddable streaming engine
Producer Consumer
Streaming Engine

What is Stream Processing?
A machine for combining and
processing streams of events

Features: similar to database query engine
Join Filter
Aggr-
egate
View
Window

Stateful Stream Processing adds
Tables & Local State
stream
Compacted
stream
Join
Stream Data
Stream-Tabular
Data
Domain Logic
Tables
/
Views
Kafka
Event Driven Service

Avg(o.time – p.time)
From orders, payment, user
Group by user.region
over 1 day window
emitting every second
Streams
Stream Processing Engine
Derived “Table”
Table
Domain Logic

Tools make it easy to join streams
of events (or tables of data) from
different services
Email Service
Legacy App
Orders Payments Stock
KSTREAMS
Kafka

So we might join two event
streams and create a local
store in our own Domain Model
Orders
Service
Payment
Events
Purchase
Events
PurchasePayments

Make the projections queryable
Orders
Service
Kafka
Queryable State API
Context
Boundary

Blend the concepts of services
communicating and services sharing state
customer orders
catalogue
Pay-
ments

Very different to sharing a database
customer orders
catalogue
Pay-
ments
Services communicate and
share data via the log
Slicing & Dicing is embedded in each
bounded context

But sometimes you have
to move data

Easy to drop out to an
external database if needed
DB of choice
Kafka
Connect
APIFraud
Service
Context
Boundary

Oracle
Connector API & Open Source Ecosystem
make this easier
The Log
ConnectorConnector
Producer Consumer
Streaming Engine
Cassandra

WIRED Principals
•  Wimpy: Start simple, lightweight and fault tolerant.
•  Immutable: Build a retentive, shared narrative.
•  Reactive: Leverage Asynchronicity. Focus on the now.
•  Evolutionary: Use only the data you need today.
•  Decentralized: Receiver driven. Coordination avoiding. No
God services

Architecture is really about
designing for change

The data dichotomy
Data systems are about exposing data.
Services are about hiding it.
Remember:

Shared data is a reality for
most organisations
Catalog

Embrace data that lives and flows
between services

Canonical Immutable Streams
Embed the ability to slice and dice into each
bounded context
Distributed
LogEvent Driven
Service

Shared
database
Service
Interfaces
Better
Accessibility
Better
Independence
Balance the Data Dichotomy

Create a canonical
shared narrative

Use stream processing
to handle data-on-the-
outside whilst prioritizing
the NOW

Event Driven Services
The Log
(streams & tables)
Ingestion
Services
Services with
Polyglotic
persistence
Simple Services
Streaming Services

Twitter:
@benstopford
Blog of this talk at
http://confluent.io/blog

Kafka Summit NYC 2017 - The Data Dichotomy: Rethinking Data and Services with Streams

More Related Content

What's hot

Similar to Kafka Summit NYC 2017 - The Data Dichotomy: Rethinking Data and Services with Streams

More from confluent

Recently uploaded

Kafka Summit NYC 2017 - The Data Dichotomy: Rethinking Data and Services with Streams