Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Strata Software Architecture NY: The Data Dichotomy

1,592 views

Published on

Ben Stopford is an engineer and architect working on the Apache Kafka Core Team at Confluent (the company behind Apache Kafka). A specialist in data, both from a technology and an organizational perspective, Ben previously spent five years leading data integration at a large investment bank, using a central streaming database. His earlier career spanned a variety of projects at Thoughtworks and UK-based enterprise companies. He writes at Benstopford.com.

Published in: Technology
  • Be the first to comment

Strata Software Architecture NY: The Data Dichotomy

  1. 1. The Data Dichotomy: Rethinking Data and Services with Streams Ben Stopford @benstopford
  2. 2. What are microservices really about?
  3. 3. GUI UI Service Orders Service Returns Service Fulfilment Service Payment Service Stock Service Splitting the Monolith? Single Process /Code base /Deployment Many Processes /Code bases /Deployments
  4. 4. Orders Service Autonomy?
  5. 5. Orders Service Stock Service Email Service Fulfillment Service Independently Deployable Independently Deployable Independently Deployable Independently Deployable
  6. 6. Independence is where services get their value
  7. 7. Allows Scaling
  8. 8. Scaling in terms of people What happens when we grow?
  9. 9. Companies are inevitably a collection of applications They must work together to some degree
  10. 10. Interconnection is an afterthought FTP / Enterprise Messaging etc
  11. 11. Internal World External world External world External World External world Service Boundary Services Force Us To Consider The External World
  12. 12. Internal World External world External world External World External world Service Boundary External World is something we should Design For
  13. 13. But this independence comes at a cost $$$
  14. 14. Orders Object Statement ObjectgetOpenOrders(), Less Surface Area = Less Coupling Encapsulate State Encapsulate Functionality Consider Two Objects in one address space Encapsulation => Loose Coupling
  15. 15. Change 1 Change 2 Redeploy Singly-deployable apps are easy
  16. 16. Orders Service Statement ServicegetOpenOrders(), Independently Deployable Independently Deployable Synchronized changes are painful
  17. 17. Services work best where requirements are isolated in a single bounded context
  18. 18. Single Sign On Business Serviceauthorise(), It’s unlikely that a Business Service would need the internal SSO state / function to change Single Sign On
  19. 19. SSO has a tightly bounded context
  20. 20. But business services are different
  21. 21. Catalog Authorisation Most business services share the same core stream of facts. Most services live in here
  22. 22. The futures of business services are far more tightly intertwined.
  23. 23. We need encapsulation to hide internal state. Be loosely coupled. But we need the freedom to slice & dice shared data like any other dataset
  24. 24. But data systems have little to do with encapsulation
  25. 25. Service Database Data on inside Data on outside Data on inside Data on outside Interface hides data Interface amplifies data Databases amplify the data they hold
  26. 26. The data dichotomy Data systems are about exposing data. Services are about hiding it.
  27. 27. Service Database Data on inside Data on outside Data on inside Data on outside Encapsulation protects us against future change. Databases provide ultimate flexibility for the now These approaches serve different goals
  28. 28. This affects services in one of two ways
  29. 29. getOpenOrders( fulfilled=false, deliveryLocation=CA, orderValue=100, operator=GreatherThan) Either (1) we constantly add to the interface, as datasets grow
  30. 30. getOrder(Id) getOrder(UserId) getAllOpenOrders() getAllOrdersUnfulfilled(Id) getAllOrders() (1) Services can end up looking like kookie home-grown databases
  31. 31. ...and DATA amplifies this “God Service” problem
  32. 32. (2) Give up and move whole datasets en masse getAllOrders() Data Copied Internally
  33. 33. This leads to a different problem
  34. 34. Data diverges over time Orders Service Stock Service Email Service Fulfill- ment Service
  35. 35. The more mutable copies, the more data will diverge over time
  36. 36. Nice neat services Service contract too limiting Can we change both services together, easily? NO Broaden Contract Eek $$$ NO YES Is it a shared Database? Frack it! Just give me ALL the data Data diverges. (many different versions of the same facts) Lets encapsulate Lets centralise to one copy YES Start here Cycle of inadequacy:
  37. 37. These forces compete in the systems we build Divergence Accessibility Encapsulation
  38. 38. Shared database Service Interfaces Better Accessibility Better Independence No perfect solution
  39. 39. Is there a better way?
  40. 40. Data on the Inside Data on the Outside Data on the Outside Data on the Outside Data on the Outside Service Boundary
  41. 41. Make data-on-the-outside a 1st class citizen
  42. 42. Separate Reads & Writes with Immutable Streams
  43. 43. Orders Service Stock Service Kafka Fulfilment Service UI Service Fraud Service Event Driven + CQRS + Stateful Stream Processing Commands go to owning services Queries are interpreted entirely inside in each bounded context
  44. 44. Event Driven + CQRS + Stateful Stream Processing Orders Service Stock Service Orders Fulfilment Service UI Service Fraud Service •  Events form a central Shared Narrative •  Writes/Commands => Go to “owning” service •  Reads/Queries => Wholly owned by each Service •  Stream processing toolkit
  45. 45. Kafka is a Streaming Platform
  46. 46. Kafka: a Streaming Platform The Log ConnectorsConnectors Producer Consumer Streaming Engine
  47. 47. The Log
  48. 48. Kafka is a type of messaging system Writers Kafka Readers Important Properties: •  Petabytes of data •  Scales out linearly •  Built in fault tolerance •  Log can be replayed
  49. 49. Scaling becomes a concern of the broker, not the service
  50. 50. Services Naturally Load Balance
  51. 51. This provides Fault Tolerance
  52. 52. Reset to any point in the shared narrative Rewind & Replay
  53. 53. Compacted Log (retains only latest version) Version 3 Version 2 Version 1 Version 2 Version 1 Version 5 Version 4 Version 3 Version 2 Version 1
  54. 54. Service Backbone Scalable, Fault Tolerant, Concurrent, Strongly Ordered, Retentive The Log ConnectorsConnectors Producer Consumer Streaming Engine
  55. 55. A place to keep the data-on-the-outside as an immutable narrative
  56. 56. Now add Stream Processing
  57. 57. Kafka’s Streaming API A general, embeddable streaming engine The Log ConnectorsConnectors Producer Consumer Streaming Engine
  58. 58. What is Stream Processing? A machine for combining and processing streams of events
  59. 59. Features: similar to database query engine Join Filter Aggr- egate View Window
  60. 60. Stateful Stream Processing adds Tables & Local State stream Compacted stream Join Stream Data Stream-Tabular Data Domain Logic Tables / Views Kafka Event Driven Service
  61. 61. Avg(o.time – p.time) From orders, payment, user Group by user.region over 1 day window emitting every second Stateful Stream Processing Streams Stream Processing Engine Derived “Table” Table Domain Logic
  62. 62. Tools make it easy to join streams of events (or tables of data) from different services Email Service Legacy App Orders Payments Stock KSTREAMS Kafka
  63. 63. So we might join two event streams and create a local store in our own Domain Model Orders Service Payment Events Purchase Events PurchasePayments
  64. 64. Make the projections queryable Orders Service Kafka Queryable State API Context Boundary
  65. 65. Blend the concepts of services communicating and services sharing state customer orders catalogue Pay- ments
  66. 66. Very different to sharing a database customer orders catalogue Pay- ments Services communicate and share data via the log Slicing & Dicing is embedded in each bounded context
  67. 67. But sometimes you have to move data
  68. 68. Easy to drop out to an external database if needed DB of choice Kafka Connect APIFraud Service Context Boundary
  69. 69. Oracle Connector API & Open Source Ecosystem make this easier The Log ConnectorConnector Producer Consumer Streaming Engine Cassandra
  70. 70. WIRED Principals •  Wimpy: Start simple, lightweight and fault tolerant. •  Immutable: Build a retentive, shared narrative. •  Reactive: Leverage Asynchronicity. Focus on the now. •  Evolutionary: Use only the data you need today. •  Decentralized: Receiver driven. Coordination avoiding. No God services
  71. 71. So…
  72. 72. Architecture is really about designing for change
  73. 73. The data dichotomy Data systems are about exposing data. Services are about hiding it. Remember:
  74. 74. Shared data is a reality for most organisations Catalog
  75. 75. Embrace data that lives and flows between services
  76. 76. Canonical Immutable Streams Embed the ability to slice and dice into each bounded context Distributed LogEvent Driven Service
  77. 77. Shared database Service Interfaces Better Accessibility Better Independence Balance the Data Dichotomy
  78. 78. Create a canonical shared narrative
  79. 79. Use stream processing to handle data-on-the- outside whilst prioritizing the NOW
  80. 80. Event Driven Services The Log (streams & tables) Ingestion Services Services with Polyglotic persistence Simple Services Streaming Services
  81. 81. Twitter: @benstopford Blog of this talk at http://confluent.io/blog

×