Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The Data Dichotomy:
Rethinking Data and Services
with Streams
Ben Stopford
@benstopford
What are
microservices
really about?
GUI
UI
Service
Orders
Service
Returns
Service
Fulfilment
Service
Payment
Service
Stock
Service
Splitting the Monolith?
Sin...
Orders
Service
Autonomy?
Orders
Service
Stock
Service
Email
Service
Fulfillment
Service
Independently
Deployable
Independently
Deployable
Independe...
Independence is
where services get
their value
Allows Scaling
Scaling in terms of people
What happens when we grow?
Companies are inevitably a
collection of applications
They must work together to some degree
Interconnection is an
afterthought
FTP / Enterprise Messaging etc
Internal World
External world
External world
External World
External world
Service
Boundary
Services Force Us To Consider ...
Internal World
External world
External world
External World
External world
Service
Boundary
External World is something we...
But this
independence
comes at a cost
$$$
Orders
Object
Statement
ObjectgetOpenOrders(),
Less Surface Area = Less Coupling
Encapsulate
State
Encapsulate
Functionali...
Change 1
Change 2
Redeploy
Singly-deployable
apps are easy
Orders Service Statement ServicegetOpenOrders(),
Independently Deployable Independently Deployable
Synchronized
changes ar...
Services work best
where requirements
are isolated in a single
bounded context
Single Sign On Business Serviceauthorise(),
It’s unlikely that a Business
Service would need the
internal SSO state /
func...
SSO has a tightly
bounded context
But business
services are
different
Catalog Authorisation
Most business services share the
same core stream of facts.
Most services
live in here
The futures of
business services
are far more
tightly intertwined.
We need encapsulation to hide
internal state. Be loosely coupled.
But we need the freedom to slice
& dice shared data like...
But data systems
have little to do
with encapsulation
Service Database
Data on
inside
Data on
outside
Data on
inside
Data on
outside
Interface
hides data
Interface
amplifies
da...
The data dichotomy
Data systems are about exposing data.
Services are about hiding it.
Service Database
Data on
inside
Data on
outside
Data on
inside
Data on
outside
Encapsulation
protects us
against future
ch...
This affects
services in one of
two ways
getOpenOrders(
fulfilled=false,
deliveryLocation=CA,
orderValue=100,
operator=GreatherThan)
Either (1) we constantly add t...
getOrder(Id)
getOrder(UserId)
getAllOpenOrders()
getAllOrdersUnfulfilled(Id)
getAllOrders()
(1) Services can end up lookin...
...and DATA amplifies
this “God Service”
problem
(2) Give up and move whole datasets
en masse
getAllOrders()
Data Copied
Internally
This leads to a
different problem
Data diverges over time
Orders
Service
Stock
Service
Email
Service
Fulfill-
ment
Service
The more
mutable copies,
the more data will
diverge over time
Nice neat
services
Service
contract too
limiting
Can we change
both services
together, easily?
NO
Broaden Contract
Eek $$$...
These forces compete in
the systems we build
Divergence
Accessibility
Encapsulation
Shared
database
Service
Interfaces
Better
Accessibility
Better
Independence
No perfect solution
Is there a better
way?
Data on the Inside
Data on the Outside
Data on the Outside
Data on the Outside
Data on the Outside
Service
Boundary
Make
data-on-the-outside
a 1st class citizen
Separate Reads &
Writes with
Immutable
Streams
Orders
Service
Stock
Service
Kafka
Fulfilment
Service
UI
Service
Fraud
Service
Event Driven + CQRS +
Stateful Stream Proce...
Event Driven + CQRS +
Stateful Stream Processing
Orders
Service
Stock
Service
Orders
Fulfilment
Service
UI
Service
Fraud
S...
Kafka is a Streaming Platform
Kafka: a Streaming Platform
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
The Log
Kafka is a type of messaging system
Writers Kafka Readers
Important Properties:
•  Petabytes of data
•  Scales out linearl...
Scaling becomes a
concern of the
broker, not the
service
Services Naturally Load
Balance
This provides Fault Tolerance
Reset to any point in the shared
narrative
Rewind & Replay
Compacted Log
(retains only latest version)
Version 3
Version 2
Version 1
Version 2
Version 1
Version 5
Version 4
Version ...
Service Backbone
Scalable, Fault Tolerant, Concurrent, Strongly Ordered, Retentive
The Log ConnectorsConnectors
Producer C...
A place to keep the
data-on-the-outside
as an immutable
narrative
Now add Stream
Processing
Kafka’s Streaming API
A general, embeddable streaming engine
The Log ConnectorsConnectors
Producer Consumer
Streaming Engi...
What is Stream Processing?
A machine for combining and
processing streams of events
Features: similar to database query engine
Join Filter
Aggr-
egate
View
Window
Stateful Stream Processing adds
Tables & Local State
stream
Compacted
stream
Join
Stream Data
Stream-Tabular
Data
Domain L...
Avg(o.time – p.time)
From orders, payment, user
Group by user.region
over 1 day window
emitting every second
Stateful Stre...
Tools make it easy to join streams
of events (or tables of data) from
different services
Email Service
Legacy App
Orders P...
So we might join two event
streams and create a local
store in our own Domain Model
Orders
Service
Payment
Events
Purchase...
Make the projections queryable
Orders
Service
Kafka
Queryable State API
Context
Boundary
Blend the concepts of services
communicating and services sharing state
customer orders
catalogue
Pay-
ments
Very different to sharing a database
customer orders
catalogue
Pay-
ments
Services communicate and
share data via the log
...
But sometimes you have
to move data
Easy to drop out to an
external database if needed
DB of choice
Kafka
Connect
APIFraud
Service
Context
Boundary
Oracle
Connector API & Open Source Ecosystem
make this easier
The Log
ConnectorConnector
Producer Consumer
Streaming Engin...
WIRED Principals
•  Wimpy: Start simple, lightweight and fault tolerant.
•  Immutable: Build a retentive, shared narrative...
So…
Architecture is really about
designing for change
The data dichotomy
Data systems are about exposing data.
Services are about hiding it.
Remember:
Shared data is a reality for
most organisations
Catalog
Embrace data that lives and flows
between services
Canonical Immutable Streams
Embed the ability to slice and dice into each
bounded context
Distributed
LogEvent Driven
Serv...
Shared
database
Service
Interfaces
Better
Accessibility
Better
Independence
Balance the Data Dichotomy
Create a canonical
shared narrative
Use stream processing
to handle data-on-the-
outside whilst prioritizing
the NOW
Event Driven Services
The Log
(streams & tables)
Ingestion
Services
Services with
Polyglotic
persistence
Simple Services
S...
Twitter:
@benstopford
Blog of this talk at
http://confluent.io/blog
Strata Software Architecture NY: The Data Dichotomy
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Strata Software Architecture NY: The Data Dichotomy

Download to read offline

Ben Stopford is an engineer and architect working on the Apache Kafka Core Team at Confluent (the company behind Apache Kafka). A specialist in data, both from a technology and an organizational perspective, Ben previously spent five years leading data integration at a large investment bank, using a central streaming database. His earlier career spanned a variety of projects at Thoughtworks and UK-based enterprise companies. He writes at Benstopford.com.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Strata Software Architecture NY: The Data Dichotomy

  1. 1. The Data Dichotomy: Rethinking Data and Services with Streams Ben Stopford @benstopford
  2. 2. What are microservices really about?
  3. 3. GUI UI Service Orders Service Returns Service Fulfilment Service Payment Service Stock Service Splitting the Monolith? Single Process /Code base /Deployment Many Processes /Code bases /Deployments
  4. 4. Orders Service Autonomy?
  5. 5. Orders Service Stock Service Email Service Fulfillment Service Independently Deployable Independently Deployable Independently Deployable Independently Deployable
  6. 6. Independence is where services get their value
  7. 7. Allows Scaling
  8. 8. Scaling in terms of people What happens when we grow?
  9. 9. Companies are inevitably a collection of applications They must work together to some degree
  10. 10. Interconnection is an afterthought FTP / Enterprise Messaging etc
  11. 11. Internal World External world External world External World External world Service Boundary Services Force Us To Consider The External World
  12. 12. Internal World External world External world External World External world Service Boundary External World is something we should Design For
  13. 13. But this independence comes at a cost $$$
  14. 14. Orders Object Statement ObjectgetOpenOrders(), Less Surface Area = Less Coupling Encapsulate State Encapsulate Functionality Consider Two Objects in one address space Encapsulation => Loose Coupling
  15. 15. Change 1 Change 2 Redeploy Singly-deployable apps are easy
  16. 16. Orders Service Statement ServicegetOpenOrders(), Independently Deployable Independently Deployable Synchronized changes are painful
  17. 17. Services work best where requirements are isolated in a single bounded context
  18. 18. Single Sign On Business Serviceauthorise(), It’s unlikely that a Business Service would need the internal SSO state / function to change Single Sign On
  19. 19. SSO has a tightly bounded context
  20. 20. But business services are different
  21. 21. Catalog Authorisation Most business services share the same core stream of facts. Most services live in here
  22. 22. The futures of business services are far more tightly intertwined.
  23. 23. We need encapsulation to hide internal state. Be loosely coupled. But we need the freedom to slice & dice shared data like any other dataset
  24. 24. But data systems have little to do with encapsulation
  25. 25. Service Database Data on inside Data on outside Data on inside Data on outside Interface hides data Interface amplifies data Databases amplify the data they hold
  26. 26. The data dichotomy Data systems are about exposing data. Services are about hiding it.
  27. 27. Service Database Data on inside Data on outside Data on inside Data on outside Encapsulation protects us against future change. Databases provide ultimate flexibility for the now These approaches serve different goals
  28. 28. This affects services in one of two ways
  29. 29. getOpenOrders( fulfilled=false, deliveryLocation=CA, orderValue=100, operator=GreatherThan) Either (1) we constantly add to the interface, as datasets grow
  30. 30. getOrder(Id) getOrder(UserId) getAllOpenOrders() getAllOrdersUnfulfilled(Id) getAllOrders() (1) Services can end up looking like kookie home-grown databases
  31. 31. ...and DATA amplifies this “God Service” problem
  32. 32. (2) Give up and move whole datasets en masse getAllOrders() Data Copied Internally
  33. 33. This leads to a different problem
  34. 34. Data diverges over time Orders Service Stock Service Email Service Fulfill- ment Service
  35. 35. The more mutable copies, the more data will diverge over time
  36. 36. Nice neat services Service contract too limiting Can we change both services together, easily? NO Broaden Contract Eek $$$ NO YES Is it a shared Database? Frack it! Just give me ALL the data Data diverges. (many different versions of the same facts) Lets encapsulate Lets centralise to one copy YES Start here Cycle of inadequacy:
  37. 37. These forces compete in the systems we build Divergence Accessibility Encapsulation
  38. 38. Shared database Service Interfaces Better Accessibility Better Independence No perfect solution
  39. 39. Is there a better way?
  40. 40. Data on the Inside Data on the Outside Data on the Outside Data on the Outside Data on the Outside Service Boundary
  41. 41. Make data-on-the-outside a 1st class citizen
  42. 42. Separate Reads & Writes with Immutable Streams
  43. 43. Orders Service Stock Service Kafka Fulfilment Service UI Service Fraud Service Event Driven + CQRS + Stateful Stream Processing Commands go to owning services Queries are interpreted entirely inside in each bounded context
  44. 44. Event Driven + CQRS + Stateful Stream Processing Orders Service Stock Service Orders Fulfilment Service UI Service Fraud Service •  Events form a central Shared Narrative •  Writes/Commands => Go to “owning” service •  Reads/Queries => Wholly owned by each Service •  Stream processing toolkit
  45. 45. Kafka is a Streaming Platform
  46. 46. Kafka: a Streaming Platform The Log ConnectorsConnectors Producer Consumer Streaming Engine
  47. 47. The Log
  48. 48. Kafka is a type of messaging system Writers Kafka Readers Important Properties: •  Petabytes of data •  Scales out linearly •  Built in fault tolerance •  Log can be replayed
  49. 49. Scaling becomes a concern of the broker, not the service
  50. 50. Services Naturally Load Balance
  51. 51. This provides Fault Tolerance
  52. 52. Reset to any point in the shared narrative Rewind & Replay
  53. 53. Compacted Log (retains only latest version) Version 3 Version 2 Version 1 Version 2 Version 1 Version 5 Version 4 Version 3 Version 2 Version 1
  54. 54. Service Backbone Scalable, Fault Tolerant, Concurrent, Strongly Ordered, Retentive The Log ConnectorsConnectors Producer Consumer Streaming Engine
  55. 55. A place to keep the data-on-the-outside as an immutable narrative
  56. 56. Now add Stream Processing
  57. 57. Kafka’s Streaming API A general, embeddable streaming engine The Log ConnectorsConnectors Producer Consumer Streaming Engine
  58. 58. What is Stream Processing? A machine for combining and processing streams of events
  59. 59. Features: similar to database query engine Join Filter Aggr- egate View Window
  60. 60. Stateful Stream Processing adds Tables & Local State stream Compacted stream Join Stream Data Stream-Tabular Data Domain Logic Tables / Views Kafka Event Driven Service
  61. 61. Avg(o.time – p.time) From orders, payment, user Group by user.region over 1 day window emitting every second Stateful Stream Processing Streams Stream Processing Engine Derived “Table” Table Domain Logic
  62. 62. Tools make it easy to join streams of events (or tables of data) from different services Email Service Legacy App Orders Payments Stock KSTREAMS Kafka
  63. 63. So we might join two event streams and create a local store in our own Domain Model Orders Service Payment Events Purchase Events PurchasePayments
  64. 64. Make the projections queryable Orders Service Kafka Queryable State API Context Boundary
  65. 65. Blend the concepts of services communicating and services sharing state customer orders catalogue Pay- ments
  66. 66. Very different to sharing a database customer orders catalogue Pay- ments Services communicate and share data via the log Slicing & Dicing is embedded in each bounded context
  67. 67. But sometimes you have to move data
  68. 68. Easy to drop out to an external database if needed DB of choice Kafka Connect APIFraud Service Context Boundary
  69. 69. Oracle Connector API & Open Source Ecosystem make this easier The Log ConnectorConnector Producer Consumer Streaming Engine Cassandra
  70. 70. WIRED Principals •  Wimpy: Start simple, lightweight and fault tolerant. •  Immutable: Build a retentive, shared narrative. •  Reactive: Leverage Asynchronicity. Focus on the now. •  Evolutionary: Use only the data you need today. •  Decentralized: Receiver driven. Coordination avoiding. No God services
  71. 71. So…
  72. 72. Architecture is really about designing for change
  73. 73. The data dichotomy Data systems are about exposing data. Services are about hiding it. Remember:
  74. 74. Shared data is a reality for most organisations Catalog
  75. 75. Embrace data that lives and flows between services
  76. 76. Canonical Immutable Streams Embed the ability to slice and dice into each bounded context Distributed LogEvent Driven Service
  77. 77. Shared database Service Interfaces Better Accessibility Better Independence Balance the Data Dichotomy
  78. 78. Create a canonical shared narrative
  79. 79. Use stream processing to handle data-on-the- outside whilst prioritizing the NOW
  80. 80. Event Driven Services The Log (streams & tables) Ingestion Services Services with Polyglotic persistence Simple Services Streaming Services
  81. 81. Twitter: @benstopford Blog of this talk at http://confluent.io/blog
  • MorvaridAprin

    Oct. 20, 2017
  • StreamingAnalytics

    Apr. 7, 2017
  • cris_weber

    Apr. 6, 2017

Ben Stopford is an engineer and architect working on the Apache Kafka Core Team at Confluent (the company behind Apache Kafka). A specialist in data, both from a technology and an organizational perspective, Ben previously spent five years leading data integration at a large investment bank, using a central streaming database. His earlier career spanned a variety of projects at Thoughtworks and UK-based enterprise companies. He writes at Benstopford.com.

Views

Total views

2,372

On Slideshare

0

From embeds

0

Number of embeds

998

Actions

Downloads

22

Shares

0

Comments

0

Likes

3

×