Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Global Source of Truth for the Microservices Generation

555 views

Published on

One of the biggest challenges for today’s microservice generation is data, which gets split into fragments that are spread across a company, making it hard to get a joined-up view. One solution is to have a single, shared database that all services can access, but sharing databases across different services is a well-known anti-pattern. What if instead you shared a replayable commit log? This is the basic notion behind one of the most interesting and provocative ideas to arise from the stream-processing community.

Ben Stopford explains how an event stream—stored in a replayable log—can be used as a source of truth, incorporating the retentive properties of a database in a system designed to share data across many teams, cloud providers, or geographies. This leads to the idea of a database turned inside out: a central commit log spawning many continuously updated caches and views, embedded in different microservices. Ben examines the subtler, systemic effects that the pattern leads to—better autonomy, easier evolution and a more ephemeral approach to data—and explores the use of logs that span geographical regions and cloud providers. Along the way, he reflects on the practicalities of using logs as a distributed storage system and looks at some of the real-world applications of this approach.

Published in: Software
  • Be the first to comment

A Global Source of Truth for the Microservices Generation

  1. 1. A Global Source of Truth for the Microservices Generation Ben Stopford Office of the CTO Confluent @benstopford
  2. 2. Where does the data live? In the Events
  3. 3. Trade Surveillance Project • 9 months sourcing 16 data sets • Different formats (including for historical extracts) • Batch based approach
  4. 4. Event Streams Orders Payments Customers Distinct Visits Destination Elasticsearch Postgres AWS Lambda Other Kafka Select Organizational Events Stream Processing SELECT * FROM ORDERS O, CUSTOMERS C WHERE O.REGION = ‘EU’ AND C.TYPE = ‘Platinum’ Msgs/Day Customers Stream Processing Elastic Lambda Orders History 1w All
  5. 5. Event-driven designs are (mostly) location independent
  6. 6. Apps Apps Apps Apps Search Monitoring Apps Apps Apps Apps Apps Apps Search Monitoring Apps Apps Apps Search NoSQL Apps Apps DWH Hado STREAM ING PLATFORM Apps Search NoSQL Apps DWH STREAMING PLATFORM PRODUCERCONSUMER Streaming Platform
  7. 7. Event Storage Kafka stores petabytes of data Stream Processing Real-time processing over streams and tables Scalability Clusters of hundreds of machines. Global. + + + Messaging + …
  8. 8. Stream Processing
  9. 9. Formulae 1 – Race Telemetry • 400 Sensors on car • 70,000 derivative measures • Events streamed back to base • Analyzed in real time • Tire modelling • Racing line • Aerodynamics • Machine Learning and Physics Models. • Replayed later for post race analysis. Race Track HQ e.g. Tire modelling: - Temp - Pressure - Suspension compression Stream Processing
  10. 10. Post race analysis
  11. 11. Analytics SourceofTruth
  12. 12. This is a form of Event Sourcing We can apply this idea to any application
  13. 13. What is event sourcing?
  14. 14. In Event Sourcing events are immutable, stateless and truthful.
  15. 15. A Shopping Cart as Events Shopping Cart Events 2 Trousers added 1 Jumper added 1 Trousers removed 1 Hat added Checkout Shopping Cart
  16. 16. Traditional Event Sourcing (Store raw events in a database in time order) Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Journal of every state change Save State Changes as Events Apps Search Monitoring Apps Apps
  17. 17. Traditional Event Sourcing (Derive current state from truthful events) Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Save State Changes as Events Apps Search Monitoring Apps Apps Apply Projection Query by customer Id - Projection applied on read - Constantly rederived from truthful events - No schema migration
  18. 18. Using Kafka: A Distributed Log Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M All events, stored indefinitely
  19. 19. Using Kafka: Log, but no query Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Can’t query by CustomerId CustomerId CustomerId CustomerId CustomerId
  20. 20. CQRS with Kafka Using events to build a view (DB, Cache, Stream Processor) Apps Search Mo Apps Apps S T R E A M I N G P L AT F O R M Projection (Stream Processor) Query by customer Id Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M View Events Events accumulate in the log - Event stream is source of truth - View can be a DB, Cache or Stateful Stream Processor - View can be re-derived from the event stream http://bit.ly/kafka-microservice-examples
  21. 21. Does anyone actually do this?
  22. 22. New York Times Source of Truth Every article since 1851 https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/ Normalized assets (images, articles, bylines, tags all separate messages) Denormalized into “Content View”
  23. 23. What do I do if I already have a database?
  24. 24. Alternate Approach: “Write Through” (Event model in DB, CDC Connector) Apps Monitoring Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M Write Query Every write becomes an event Note: - Database is now the source of truth. - Events are a “cache” available to others. - Users can read their writes immediately (not true of CQRS) COMMON IN PRACTICE
  25. 25. What about Microservices?
  26. 26. We can repurpose the event stream Apps Search NoSQL M S Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M View Shipping Service Source of Truth Full-text Search Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M View
  27. 27. Join datasets from many different sources in real-time Fraud Service Orders Service Payment Service Customer Service Event Log Projection created in Kafka Streams API
  28. 28. Create Aggregate Streams (easier to consume, keep apps stateless) Orders Service Payment Service Customer Service Aggregate Events Apps Search Apps Apps Search NoSQL Apps DWH S T R E A M I N G P L AT F O NoSQL Order Payment Customer
  29. 29. - Historical and real-time data are both self service (pluggable) - Source systems don’t need to republish - Views are use case specific / decoupled / autonomous - Encourages Event-driven design Billin g Shipping Fraud Fraud Fulfilment The Source of Truth Services flex around a central source of truth Many views derived from the log Apps M onitorin Apps Apps Hadoop R E A M IN G P L A T F O R M a.k.a. Forward Deployed Event Cache, The Database Inside Out Event driven services Apps Search NoSQL Apps Apps DW H S T R EA M IN G P LA T FO R M
  30. 30. All patterns involve trade offs
  31. 31. Do I need to store events in a messaging system?
  32. 32. Global Deployment Multi-Team Cluster Automated data provisioning. Cached Datasets & Streaming Apps 5 4 3 2 Investment & time Single Team, Microservices / Streaming Analytics 1 Value It’s a pattern, adopt it when you’re ready
  33. 33. Stateful Stream Processing requires storage Transaction Payments KStreams Customers Table (Read Only) Intermediary State (Read/Write) Orders Event Storage
  34. 34. Start with Dimensions Facts (Streams) Dimensions (Tables) Orders Visits Payments CustomersAccountsProducts Large, High Velocity. Small, Low Velocity. Dimensions typically only useful as a whole dataset
  35. 35. Stateful Stream Processing is Stateful. Aren’t stateful applications bad?
  36. 36. Separate stateful and stateless operations (Just like you do with a database) KSQL Stateful Data Layer Stateless Application layer Business logic goes here Source of Truth
  37. 37. For the hip and trendy, use FaaS KSQL Stateless FaaS FaaS FaaS FaaS Autoscale Stateful Data Layer
  38. 38. Won’t reloading events and applying projections be slow?
  39. 39. Writes are typically the limiting factor Kafka Streams: • RocksDB: capable of ~10M x 500 KB objects per minute on top end hardware (roughly GbE speed) Regular database: • Postgres will bulk-load ~1M rows per minute. (Kafka delivers data at ~network speed)
  40. 40. Lean Data – take only the data you need Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Apps Search Monitoring Apps Apps S T R E A M I N G P L AT F O R M Apps Search Apps Apps Search NoSQL Apps A DWH Hadoop S T R E A M I N G P L AT F O R M If messaging remembers, databases don’t have to. SELECT O.OrderId, C.Email FROM ORDERS O, CUSTOMERS C WHERE O.REGION = ‘EU’ AND C.TYPE = ‘Platinum’
  41. 41. Is Kafka built for long term storage?
  42. 42. It’s ok to Store Data in Kafka • Largely built by a guy who built databases (DB2…) • Log files are immutable once they roll • (unless it’s a compacted topic) • Log is O(1) read, O(1) write • But care required: Writes can block behind historical scans • Some users run dedicated clusters for reading old data • ZFS has several page cache optimizations • Tiered storage would help
  43. 43. What about GDPR?
  44. 44. Anonymize with a Stream Processor Anonymized events Anonymization metadata
  45. 45. Delete messages by key with a compacted topic https://www.confluent.io/blog/handling-gdpr-log-forget/
  46. 46. Evolving with events
  47. 47. Events are immutable, stateless and truthful.
  48. 48. Events as a Global Source of Truth
  49. 49. In summary • Broadcast events. • Cache shared datasets in the log and make them discoverable. • Let users manipulate event streams directly. • Drive simple microservices, or prepare use case specific views in a DB of your choice.
  50. 50. Self-service data, wherever you are, in whatever form you need, at whatever scale.
  51. 51. Thank you @benstopford Microservices blog with associated code http://bit.ly/kafka-microservice-examples Book: https://www.confluent.io/designing-event-driven-systems

×