Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Event Driven Microservices


Published on

The presentation explains the reasons we picked Kafka as Streaming Hub and the use of Kafka Streams to avoid common anti-patterns, streamline development experience, improve resilience, enhance performances and enable experimentation. A step-by-step example will be presented to introduce the Kafka Streams DSL and understand what happens under the hood of a stateful streaming application.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Event Driven Microservices

  1. 1. Event driven µ-services Rethinking Data and Services with Streams Dublin μServices User Group 27th September 2018 @fabriziofortino
  2. 2. About me ● Staff Engineer @HBCTech ● 15+ years of experience in software development ● Open source enthusiast and contributor ● @fabriziofortino
  3. 3. What this talk is about ● HBC Architecture Evolution ● Why Kafka? ● Kafka Overview ● Streaming Platform + Search + µ-services ● Use of Kafka Streams in a µ-services architecture to ○ Avoid common antipatterns ○ Simplify development experience ○ Improve resilience and performances ○ Enable experimentation
  4. 4. HBC: Stores + Online Banners
  5. 5. 2007 Monolith RoR application + Postgres 2010 SOA Broke up the monolith in large services 2012 µ-services Incremental introduction of µ-services (up to ~300) 2016 µ-services + ƛ Introduction of functions as a service ƛ ƛ ƛ ƛ 2018 + µ-services + streams Streaming platform to share data among services ƛ ƛ ƛ ƛ From monolith to µ-services + streams
  6. 6. lambda-architectures-a-snapshot-a-stream-a-bunch-of-deltas * Changes are propagated in real-time to Solr * Rebuild of index (s + 횫*) with zero down time * Same logic for batch stream (thank you akka-streams) * V.O.T.: “We needed a relational DB to solve a relationa problem” Search: A Snapshot, a Stream, a Bunch of Deltas Kinesis Calatrava 횫 S3 Brands, products, sales, channels, ... s 횫VOT View of Truth - PG svc-search -feed Source of Truth - PG admin
  7. 7. Hello Kafka!
  8. 8. Kafka Topics Anatomy and Log Compaction 0 1 2 3 4 5 6 7 8 9 10 11 k0 k1 k2 k1 k4 k5 k0 k7 k8 k9 k10 k10 foo bar baz qux quix conge grault garply waldo fred plugh xyzzy Producers Consumer A (offset = 6) Consumer B (offset = 11) OFFSET KEY VALUE Topic T1 - Partition 0 2 3 4 5 6 7 8 9 11 k2 k1 k4 k5 k0 k7 k8 k9 k10 baz qux quix conge grault garply waldo fred xyzzy OFFSET KEY VALUE Topic T1 - Partition 0 (after log compaction)
  9. 9. The Kafka Ecosystem ● Connect: to copy data between Kafka and another system ○ Sources: import data (eg: Postgres, MySQL, S3, Kinesis, etc) ○ Sinks: export data (eg: Postgres, Elasticsearch, Solr, etc) ● Kafka Streams: client library for building mission-critical real-time applications ● Schema Registry: metadata serving layer for storing and retrieving AVRO schemas. Allows evolution of schemas ● KSQL: streaming SQL engine ● Kubernetes Operator: simplify provisioning and operational burden
  10. 10. Kafka Streams ● Cluster/Framework free tiny client library (=~ 800 KB) ● Elastic, highly scalable, fault-tolerant ● Deployable as a standard Java/Scala application ● Built-in abstractions for streams ↔ table duality ● Declarative functional DSL with support for ○ Transformations (eg: filter, map, flatMap) ○ Aggregations (eg: count, reduce, groupBy) ○ Joins (eg: leftJoin, outerJoin) ○ Windowing (session, sliding time) ● Internal key-value state store (in-memory or disk-backed based on RocksDB) used for buffering, aggregations, interactive queries
  11. 11. Streaming Platform + Search + µ-services product inventory pricing OtherSystems Streaming Platform OtherSystems Applications Search App Kafka Streams Connect Connect web-pdp µ-services web-homepage µ-services
  12. 12. The data dichotomy between monoliths and µ-services Monolith Database product-svc inventory-svc Database Database web-pdp web-pdp Interface amplifies data Interface hides data
  13. 13. µ-services antipatterns 1/2: The God Service A data service that grows exposing an increasing set of functions to the point where it starts look like a homegrown database ○ getProduct(id) ○ getAllProducts(saleId) ○ getAllAvailableProducts(saleId) ○ getAllActiveProducts() ○ getSku(id) ○ getAllSkusByProduct()
  14. 14. µ-services antipatterns 2/2: REST-to-ETL problem When it’s preferable to extract the data from a data service and keep it local for different reasons: ● Aggregation: data needs to be combined with another dataset ● Caching: data needs to be closer to get better performances ● Ownership: the data services provide limited functionalities and can’t be changed quickly enough
  15. 15. In the past we used caches to mitigate issues, but . . . product-service inventory-service web-pdp commons lib product cache * Cache refresh every 20 min * Fast response time (data locally available) * JSON from product service 1Gb * Startup time 10m * JVM GC every 20m on cache clear * m4.xlarge, w/ 14Gb JVM Heap Take 1: on heap caching web-pdp commons lib * Startup Time in seconds * No more stop-the-world GC * c4.xlarge (CPU!!!), w/ 6Gb JVM Heap * Performance degradation product-service inventory-service Elasticacache Take 2: centralized Elasticache L1 CACHE
  16. 16. Solution: Kafka + Kafka Streams (aka turning the DB inside out) Commit Log Indexes Caching Query Engine Kafka Streams
  17. 17. web-pdp - streaming topology 1/2 val builder = new StreamsBuilder() def inventoriesStreamToTable(): KTable[InventoryKey, Inventory] = { implicit val consumedInv: Consumed[db.catalog.InventoryKey, db.catalog.Inventory] = Consumed.`with`(inventoryTopicConfig.keySerde, inventoryTopicConfig.valueSerde) builder.table(inventories) } def productStreamToTable(): KTable[ProductKey, catalog.Product] = { implicit val consumedProducts: Consumed[db.catalog.ProductKey, db.catalog.Product] = Consumed.`with`(productTopicConfig.keySerde, productTopicConfig.valueSerde) builder.table(products) } val inventoriesTable = inventoriesStreamToTable() val productsTable = productStreamToTable() sku101 sku201 sku102 sku101 pId=p01 value=5 pId=p02 value=2 pId=p01 value=1 pId=p01 value=4 sku-id value sku201 pId=p02 value=2 sku102 pId=p01 value=1 sku101 pId=p01 value=4 Inventory Topic (kafka) Inventory KTable (web-pdp local store) p01 p02 p03 p02 Red Shoes Blak Dress White Scarf Black Dress p-id value p01 Red Shoes p03 White Scarf p02 Black Dress Products Topic (kafka) Product KTable (web-pdp local store)
  18. 18. web-pdp - streaming topology 2/2 val inventoriesByProduct: KTable[ProductKey, Inventories] = inventoriesTable .groupBy((inventoryKey, inventory) = (ProductKey(inventoryKey.productId), inventory)) ) .aggregate[Inventories]( () = Inventories(List[Inventory]()), (_: ProductKey, inv: Inventory, acc: Inventories) = Inventories(inv :: acc.items), (_: ProductKey, inv: Inventory, acc: Inventories) = Inventories(acc.items.filter(_.skuId != inv.skuId)) ) def materializeView(product: Product, inventories: Inventories): ProductEnriched = ??? productsTable .join( inventoriesByProduct, materializeView, ) val kStreams = new KafkaStreams(, streamsConfig) kStreams.start() sku-id value sku201 pId=p02 value=2 sku102 pId=p01 value=1 sku101 pId=p01 value=4 Inventory KTable (web-pdp local store) p-id value p02 [sku=sku201, value=2] p01 [sku=sku102, value=1 sku=sku101, value=4]] Inventory By Product KTable (web-pdp local store) p-id value p02 Black Dress [sku=sku201, value=2] p01 Red Shoes [sku=sku102, value=1 sku=sku101, value=4]] p03 White Scarf products-enriched (web-pdp local store) p-id value p01 Red Shoes p03 White Scarf p02 Black Dress Product KTable (web-pdp local store)
  19. 19. web-pdp - Interactive queries val store: ReadOnlyKeyValueStore[ProductKey, ProductEnriched] =, QueryableStoreTypes.keyValueStore()) val productEnriched = store.get(new ProductKey(p01)) p-id value p02 Black Dress [sku=sku201, value=2] p01 Red Shoes [sku=sku102, value=1 sku=sku101, value=4]] p03 White Scarf products-enriched (web-pdp local store) * Startup Time 2 min (single node) * Really fast response time: data is local and fully precomputed * No dependencies to other services - less things can go wrong * Centralized monitoring / alerting
  20. 20. Summary ● Lambda architecture works well but the implementation is not trivial ● Stream processing introduces a new programming paradigm ● Use the schema registry from day 1 to support schema changes compatibility and avoid to break downstream consumers ● A replayable log (Kafka) and a streaming library (Kafka Streams) give the freedom to slice, dice, enrich and evolve data locally as it arrives increasing resilience and performance
  21. 21. Books
  22. 22. Resources ● Data on the Outside versus Data on the Inside [P. Helland - 2005] ● The Log: What every software engineer should know about real-time data's unifying abstraction [J. Kreps - 2013] ● Questioning the Lambda Architecture [J. Kreps - 2014] ● Turning the database inside-out with Apache Samza [M. Kleppmann - 2015] ● The Data Dichotomy: Rethinking the Way We Treat Data and Services [B. Stopford - 2016] ● Introducing Kafka Streams: Stream Processing Made Simple [J. Kreps - 2016] ● Building a µ-services ecosystem with Kafka Stream and KSQL [B. Stopford - 2017] ● Streams and Tables: Two Sides of the Same Coin [Sax - Weidlich - Wang - Freytag - 2018]