The document discusses how a company called HBC evolved their architecture from a monolithic application to a microservices architecture with streams. It describes how they introduced Kafka and Kafka Streams to share data between microservices in real-time, avoid common antipatterns, simplify development, and improve resilience and performance. The talk outlines how HBC uses Kafka Streams within their microservices to process streaming data, perform aggregations and joins, enable interactive queries, and power their search functionality.
2. About me
● Staff Engineer @HBCTech
● 15+ years of experience in
software development
● Open source enthusiast and
contributor
● @fabriziofortino
3. What this talk is about
● HBC Architecture Evolution
● Why Kafka?
● Kafka Overview
● Streaming Platform + Search + µ-services
● Use of Kafka Streams in a µ-services architecture to
○ Avoid common antipatterns
○ Simplify development experience
○ Improve resilience and performances
○ Enable experimentation
5. 2007
Monolith
RoR application +
Postgres
2010
SOA
Broke up the
monolith in large
services
2012
µ-services
Incremental introduction
of µ-services (up to
~300)
2016
µ-services + ƛ
Introduction of
functions as a
service
ƛ ƛ
ƛ ƛ
2018 +
µ-services +
streams
Streaming platform
to share data among
services
ƛ ƛ
ƛ ƛ
From monolith to µ-services + streams
6. https://www.slideshare.net/InfoQ/
lambda-architectures-a-snapshot-a-stream-a-bunch-of-deltas
* Changes are propagated in real-time to Solr
* Rebuild of index (s + 횫*) with zero down time
* Same logic for batch stream (thank you
akka-streams)
* V.O.T.: “We needed a relational DB to solve a relationa
problem”
Search: A Snapshot, a Stream, a Bunch of Deltas
Kinesis
Calatrava
횫
S3
Brands, products,
sales, channels, ...
s
횫VOT
View of Truth - PG
svc-search
-feed
Source of Truth - PG
admin
12. The Kafka Ecosystem
● Connect: to copy data between Kafka and another system
○ Sources: import data (eg: Postgres, MySQL, S3, Kinesis, etc)
○ Sinks: export data (eg: Postgres, Elasticsearch, Solr, etc)
● Kafka Streams: client library for building mission-critical real-time
applications
● Schema Registry: metadata serving layer for storing and retrieving
AVRO schemas. Allows evolution of schemas
● KSQL: streaming SQL engine
● Kubernetes Operator: simplify provisioning and operational burden
13. Kafka Streams
● Cluster/Framework free tiny client library (=~ 800 KB)
● Elastic, highly scalable, fault-tolerant
● Deployable as a standard Java/Scala application
● Built-in abstractions for streams ↔ table duality
● Declarative functional DSL with support for
○ Transformations (eg: filter, map, flatMap)
○ Aggregations (eg: count, reduce, groupBy)
○ Joins (eg: leftJoin, outerJoin)
○ Windowing (session, sliding time)
● Internal key-value state store (in-memory or disk-backed based on
RocksDB) used for buffering, aggregations, interactive queries
15. The data dichotomy between monoliths and µ-services
Monolith Database
product-svc
inventory-svc
Database
Database
web-pdp
web-pdp
Interface amplifies data
Interface hides data
16. µ-services antipatterns 1/2: The God Service
A data service that grows exposing an increasing set of functions to the
point where it starts look like a homegrown database
○ getProduct(id)
○ getAllProducts(saleId)
○ getAllAvailableProducts(saleId)
○ getAllActiveProducts()
○ getSku(id)
○ getAllSkusByProduct()
17. µ-services antipatterns 2/2: REST-to-ETL problem
When it’s preferable to extract the data from a data service and keep it
local for different reasons:
● Aggregation: data needs to be combined with another dataset
● Caching: data needs to be closer to get better performances
● Ownership: the data services provide limited functionalities and can’t
be changed quickly enough
18. In the past we used caches to mitigate issues, but . . .
product-service inventory-service
web-pdp
commons lib
product cache
* Cache refresh every 20 min
* Fast response time (data locally
available)
* JSON from product service 1Gb
* Startup time 10m
* JVM GC every 20m on cache clear
* m4.xlarge, w/ 14Gb JVM Heap
Take 1: on heap caching
web-pdp
commons lib
* Startup Time in seconds
* No more stop-the-world GC
* c4.xlarge (CPU!!!), w/ 6Gb JVM Heap
* Performance degradation
product-service
inventory-service
Elasticacache
Take 2: centralized Elasticache
L1 CACHE
19. Solution: Kafka + Kafka Streams
(aka turning the DB inside out)
Commit Log
Indexes
Caching
Query Engine
Kafka Streams
20. web-pdp - streaming topology 1/2
val builder = new StreamsBuilder()
def inventoriesStreamToTable(): KTable[InventoryKey, Inventory] = {
implicit val consumedInv: Consumed[db.catalog.InventoryKey, db.catalog.Inventory] =
Consumed.`with`(inventoryTopicConfig.keySerde, inventoryTopicConfig.valueSerde)
builder.table(inventories)
}
def productStreamToTable(): KTable[ProductKey, catalog.Product] = {
implicit val consumedProducts: Consumed[db.catalog.ProductKey, db.catalog.Product] =
Consumed.`with`(productTopicConfig.keySerde, productTopicConfig.valueSerde)
builder.table(products)
}
val inventoriesTable = inventoriesStreamToTable()
val productsTable = productStreamToTable()
sku101 sku201 sku102 sku101
pId=p01
value=5
pId=p02
value=2
pId=p01
value=1
pId=p01
value=4
sku-id value
sku201 pId=p02
value=2
sku102 pId=p01
value=1
sku101 pId=p01
value=4
Inventory Topic
(kafka)
Inventory KTable
(web-pdp local store)
p01 p02 p03 p02
Red
Shoes
Blak
Dress
White
Scarf
Black
Dress
p-id value
p01 Red
Shoes
p03 White
Scarf
p02 Black
Dress
Products Topic
(kafka)
Product KTable
(web-pdp local store)
21. web-pdp - streaming topology 2/2
val inventoriesByProduct: KTable[ProductKey, Inventories] = inventoriesTable
.groupBy((inventoryKey, inventory) =
(ProductKey(inventoryKey.productId), inventory))
)
.aggregate[Inventories](
() = Inventories(List[Inventory]()),
(_: ProductKey, inv: Inventory, acc: Inventories) = Inventories(inv :: acc.items),
(_: ProductKey, inv: Inventory, acc: Inventories) = Inventories(acc.items.filter(_.skuId !=
inv.skuId))
)
def materializeView(product: Product, inventories: Inventories): ProductEnriched = ???
productsTable
.join(
inventoriesByProduct,
materializeView,
Materialized.as(product-enriched)
)
val kStreams = new KafkaStreams(builder.build(), streamsConfig)
kStreams.start()
sku-id value
sku201 pId=p02
value=2
sku102 pId=p01
value=1
sku101 pId=p01
value=4
Inventory KTable
(web-pdp local store)
p-id value
p02 [sku=sku201, value=2]
p01 [sku=sku102, value=1
sku=sku101, value=4]]
Inventory By Product KTable
(web-pdp local store)
p-id value
p02 Black Dress
[sku=sku201, value=2]
p01 Red Shoes
[sku=sku102, value=1
sku=sku101, value=4]]
p03 White Scarf
products-enriched
(web-pdp local store)
p-id value
p01 Red
Shoes
p03 White
Scarf
p02 Black
Dress
Product KTable
(web-pdp local store)
22. web-pdp - Interactive queries
val store: ReadOnlyKeyValueStore[ProductKey, ProductEnriched] =
kStreams.store(product-enriched, QueryableStoreTypes.keyValueStore())
val productEnriched = store.get(new ProductKey(p01))
p-id value
p02 Black Dress
[sku=sku201, value=2]
p01 Red Shoes
[sku=sku102, value=1
sku=sku101, value=4]]
p03 White Scarf
products-enriched
(web-pdp local store)
* Startup Time 2 min (single node)
* Really fast response time: data is
local and fully precomputed
* No dependencies to other services -
less things can go wrong
* Centralized monitoring / alerting
23. Summary
● Lambda architecture works well but the implementation is not trivial
● Stream processing introduces a new programming paradigm
● Use the schema registry from day 1 to support schema changes
compatibility and avoid to break downstream consumers
● A replayable log (Kafka) and a streaming library (Kafka Streams) give the
freedom to slice, dice, enrich and evolve data locally as it arrives
increasing resilience and performance
25. Resources
● Data on the Outside versus Data on the Inside [P. Helland - 2005]
● The Log: What every software engineer should know about real-time data's
unifying abstraction [J. Kreps - 2013]
● Questioning the Lambda Architecture [J. Kreps - 2014]
● Turning the database inside-out with Apache Samza [M. Kleppmann - 2015]
● The Data Dichotomy: Rethinking the Way We Treat Data and Services [B.
Stopford - 2016]
● Introducing Kafka Streams: Stream Processing Made Simple [J. Kreps - 2016]
● Building a µ-services ecosystem with Kafka Stream and KSQL [B. Stopford -
2017]
● Streams and Tables: Two Sides of the Same Coin [Sax - Weidlich - Wang - Freytag
- 2018]