Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams

3,336 views

Published on

When building service-based systems, we don’t generally think too much about data. If we need data from another service, we ask for it. This pattern works well for whole swathes of use cases, particularly ones where datasets are small and requirements are simple. But real business services have to join and operate on datasets from many different sources. This can be slow and cumbersome in practice.

These problems stem from an underlying dichotomy. Data systems are built to make data as accessible as possible—a mindset that focuses on getting the job done. Services, instead, focus on encapsulation—a mindset that allows independence and autonomy as we evolve and grow. But these two forces inevitably compete in most serious service-based architectures.

Ben Stopford explains why understanding and accepting this dichotomy is an important part of designing service-based systems at any significant scale. Ben looks at how companies make use of a shared, immutable sequence of records to balance data that sits inside their services with data that is shared, an approach that allows the likes of Uber, Netflix, and LinkedIn to scale to millions of events per second.

Ben concludes by examining the potential of stream processors as a mechanism for joining significant, event-driven datasets across a whole host of services and explains why stream processing provides much of the benefits of data warehousing but without the same degree of centralization.

Published in: Technology
  • Be the first to comment

NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams

  1. 1. 1 The Data Dichotomy: Rethinking data and services with streams Ben Stopford @benstopford
  2. 2. 2 Build Features Build for the Future
  3. 3. 3 Evolution!
  4. 4. 4 KAFKA Serving Layer (Cassandra etc.) Kafka Streams / KSQL Streaming Platforms Data is embedded in each engine High Throughput Messaging Clustered Java App
  5. 5. 5 authorization_attempts possible_fraud Streaming Example
  6. 6. 6 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud
  7. 7. 7 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud
  8. 8. 8 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud
  9. 9. 9 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud
  10. 10. 10 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud
  11. 11. 11 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud
  12. 12. 12 Streaming == Manipulating Data in Flight
  13. 13. 13 Business Applications
  14. 14. 14 EcosystemsApp Increasingly we build ecosystems
  15. 15. 15 SOA / Microservices / EDA Customer Service Shipping Service
  16. 16. 16 The Problem is DATA
  17. 17. 17 Most services share the same core facts. Catalog Most services live in here
  18. 18. 18 Orders Service Payments Service Customers Service Data becomes spread out and we need to bring it together Useful Grid
  19. 19. 19 Service A Service B Service C One option is to share a database
  20. 20. 20 Service A Service B Service C Databases provide a very rich form of coupling
  21. 21. 21 Two different forces compete in our designs
  22. 22. 22 Single Sign On Business Serviceauthorise(), We are taught to encapsulate LOOSE COUPLING!
  23. 23. 23 But data systems have little to do with encapsulation
  24. 24. 24 Service Database Data on inside Data on outside Data on inside Data on outside Interface hides data Interface amplifies data Databases amplify the data they hold
  25. 25. 25 The data dichotomy Data systems are about exposing data. Services are about hiding it.
  26. 26. 26 Microservices shouldn’t share a database!
  27. 27. 27 Tension We want all the good stuff which comes with a database. We don’t want to share that database with anyone else. But we do want to share datasets in a sensible way.
  28. 28. 28 So how do we share data between services? Orders Service Shipping Service Customer Service Webserver
  29. 29. 29 Buying an iPad (with REST) Submit Order shipOrder() getCustomer() Orders Service Shipping Service Customer Service Webserver
  30. 30. 30 Buying an iPad with Events Message Broker (Kafka) Notification Data is replicated (incrementally) Submit Order Order Created Customer Updated Orders Service Shipping Service Customer Service Webserver KAFKA
  31. 31. 31 Events for Notification Only Message Broker (Kafka) Submit Order Order Created getCustomer() REST Notification Orders Service Shipping Service Customer Service Webserver KAFKA
  32. 32. 32 Events for Data Locality Customer Updated Submit Order Order Created Data is replicated Orders Service Shipping Service Customer Service Webserver KAFKA
  33. 33. 33 Events have two hats Notification Data replication
  34. 34. 34 Events are the key to scalable service ecosystems
  35. 35. 35 Streaming is the toolset for dealing with events as they move!
  36. 36. 36 Streaming Platform The Log ConnectorsConnectors Producer Consumer Streaming Engine
  37. 37. 37 Streaming Platform The Log ConnectorsConnectors Producer Consumer Streaming Engine
  38. 38. 38 What is a Distributed Log?
  39. 39. 39 Shard on the way in Producing Services Kafka Consuming Services
  40. 40. 40 Each shard is a queue Producing Services Kafka Consuming Services
  41. 41. 41 Consumers share load Producing Services Kafka Consuming Services
  42. 42. 42 A log can Rewound and Replayed Rewind & Replay
  43. 43. 43 Compacted Log (retains only latest version) Version 3 Version 2 Version 1 Version 2 Version 1 Version 5 Version 4 Version 3 Version 2 Version 1
  44. 44. 44 Streaming Platform The Log ConnectorsConnectors Producer Consumer Streaming Engine
  45. 45. 45 Kafka Connect Kafka Connect Kafka Connect Kafka
  46. 46. 46 Streaming Platform The Log ConnectorsConnectors Producer Consumer Streaming Engine
  47. 47. 47 A database engine for data-in-flight
  48. 48. 48 SELECT card_number, count(*) FROM authorization_attempts WINDOW (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; Continuously Running Queries
  49. 49. 49 Features: similar to database query engine JoinFilter Aggr- egate View Window
  50. 50. 50 Compacted Topic Join Stream Table Kafka Kafka Streams / KSQL Topic Join Streams and Tables
  51. 51. 51 Handle Asynchronicity In an asynchronous world, will the payment come first, or the order? KAFKA Buffer 5 mins Join by Key
  52. 52. 52 Handle Asynchronicity KAFKA Buffer 5 mins Join by Key KStream orders = builder.stream(“Orders”); KStream payments = builder.stream(“Payments”); orders.join(payments, KeyValue::new, JoinWindows.of(1 * MIN)) .peek((key, pair) -> emailer.sendMail(pair));
  53. 53. 53 KAFKA Join A KTable is just a stream with infinite retention
  54. 54. 54 A KTable is just a stream with infinite retention KStream orders = builder.stream(“Orders”); KStream payments = builder.stream(“Payments”); KTable customers = builder.table(“Customers”); orders.join(payments, EmailTuple::new, JoinWindows.of(1*MIN)) .join(customers, (tuple, cust) -> tuple.setCust(cust)) .peek((key, tuple) -> emailer.sendMail(tuple)); Materialize a table in two lines of code!
  55. 55. 55 KAFKA Emailer With KSQL and Node.js Create stream ToEmail From Orders, Payment, Customer where …
  56. 56. 56 Scales Out
  57. 57. 57 Streaming is about 1. Processing data incrementally 2. Moving data to where it needs to be processed (quickly and efficiently) On Notification Data Replication
  58. 58. 58 Steps to Streaming Services
  59. 59. 59 1. Take Responsibility for the past and evolve
  60. 60. 60 Stay Simple. Take Responsibility for the past Browser Webserver
  61. 61. 61 Evolve Forwards Browser Webserver Orders Service
  62. 62. 62 2. Raise events. Don’t talk to services.
  63. 63. 63 Raise events. Don’t talk to services Browser Webserver Orders Service
  64. 64. 64 KAFKA Order Requested Order Received Browser Webserver Orders Service Raise events. Don’t talk to services
  65. 65. 65 KAFKA Order Requested Order Validated Order Received Browser Webserver Orders Service Raise events. Don’t talk to services
  66. 66. 66 KAFKA Order Requested Order Validated Order Received Browser Webserver Orders Service Use Kafka as a Backbone for Events
  67. 67. 67 3. Use Connect (& CDC) to evolve away from legacy
  68. 68. 68 KAFKA Order Requested Order Validated Order Received Browser Webserver Orders Service Evolve away from Legacy
  69. 69. 69KAFKA Order Requested Order Validated Order Received Browser Webserver Orders Service Use the Database as a ‘Seam’ Connect Products
  70. 70. 70 4. Make use of Schemas
  71. 71. 71KAFKA Order Requested Order Validated Order Received Browser Webserver Orders Service Schemas are your API Connect Products Schema Registry
  72. 72. 72 5. Use the Single Writer Principal
  73. 73. 73KAFKA Order Requested Order Validated Order Received Browser Webserver Orders Service Apply the single writer principal Connect Products Schema Registry Order Completed
  74. 74. 74 Orders Service Email Service T1 T2 T3 T4 REST Service T5 Single Writer Principal
  75. 75. 75 Single Writer Principal - Creates local consistency points in the absence of Global Consistency - Makes schema upgrades easier to manage.
  76. 76. 76 6. Store Datasets in the Log
  77. 77. 77 Messaging that Remembers Orders Customers Payments Stock
  78. 78. 78 KAFKA Order Requested Order Validated Order Received Browser Webserver Orders Service New Service, No Problem! Connect Products Schema Registry Order Completed Repricing
  79. 79. 79 Orders Customers Payments Stock Single, Shared Source of Truth
  80. 80. 80 But how do you query a log?
  81. 81. 81 7. Move Data to Code
  82. 82. 82
  83. 83. 83 Connect Order Requested Order Validated Order Completed Order Received Products Browser Webserver Schema Registry Orders Service Stock Stock Materialize Stock ‘View’ Inside Service KAFKA
  84. 84. 84 Connect Order Requested Order Validated Order Completed Order Received Products Browser Webserver Schema Registry Orders Service Stock Stock Take only the data we need KAFKA
  85. 85. 85 Data Movement Be realistic: • Network is no longer the bottleneck • Indexing is: • In memory indexes help • Keep datasets focused
  86. 86. 86 8. Use the log as a ‘database’
  87. 87. 87 Connect Order Requested Order Validated Order Completed Order Received Products Browser Webserver Schema Registry Orders Service Reserved Stocks Stock Stock Reserved Stocks Apply Event Sourcing KAFKA Table
  88. 88. 88 Connect Order Requested Order Validated Order Completed Order Received Products Browser Webserver Schema Registry Orders Service Reserved Stocks Stock Stock Reserved Stocks Order Service Loads Reserved Stocks on Startup KAFKA
  89. 89. 89 Kafka has several features for reducing the need to move data on startup - Standby Replicas - Disk Checkpoints - Compacted topics
  90. 90. 90 9. Use Transactions to tie All Interactions Together
  91. 91. 91 OrderRequested (IPad) 2a. Order Validated 2c. Offset Commit 2b. IPad Reserved Internal State: Stock = 17 Reservations = 2 Tie Events & State with Transactions
  92. 92. 92 Connect TRANSACTION Order Requested Order Validated Order Completed Order Received Products Browser Webserver Schema Registry Orders Service Reserved Stocks Stock Stock Reserved Stocks Transactions KAFKA
  93. 93. 93 10. Bridge the Sync/Async Divide with a Streaming Ecosystem
  94. 94. 94 POST GET Load Balancer ORDERSORDERS OVTOPIC Order Validations KAFKA INVENTORY Orders Inventory Fraud Service Order Details Service Inventory Service (see previous figure) Order Created Order Validated Orders View Q in CQRS Orders Service C is CQRS Services in the Micro: Orders Service Find the code online!
  95. 95. 95 Orders Customers Payments Stock Each service is optimized for autonomy A Database Inside Out HISTORICAL EVENT STREAMS
  96. 96. 96 Kafka KAFKA New York Tokyo London Global / Disconnected Ecosystems
  97. 97. 97 So…
  98. 98. 98 Good architectures have little to do with this:
  99. 99. 99 It’s about how systems evolves over time
  100. 100. 100 Request driven isn’t enough • High coupling • Hard to handle async flows • Hard to move and join datasets.
  101. 101. 101 Leverage the Duality of Events Notification Data replication
  102. 102. 102 With a toolset built for data in flight
  103. 103. 103 The data dichotomy Data systems are about exposing data. Services are about hiding it. Remember the data dichotomy
  104. 104. 104 The Data Dichotomy We want all the good stuff which comes with a database. We don’t want to share that database with anyone else. But we do want to share datasets in a sensible way.
  105. 105. 105 • Broadcast events • Retain them in the log • Compose streaming functions • Recasting the event stream into views when you need to query. Event Driven Services
  106. 106. 106 Services built on a Streaming Platform
  107. 107. 107 Thank You @benstopford Blog Series: https://www.confluent.io/blog/tag/microservices/ Code: https://github.com/confluentinc/kafka-streams-examples

×