Event Driven Microservices

Event driven
µ-services
Rethinking Data and
Services with Streams
Dublin μServices User Group
27th September 2018
@fabriziofortino

About me
● Staff Engineer @HBCTech
● 15+ years of experience in
software development
● Open source enthusiast and
contributor
● @fabriziofortino

What this talk is about
● HBC Architecture Evolution
● Why Kafka?
● Kafka Overview
● Streaming Platform + Search + µ-services
● Use of Kafka Streams in a µ-services architecture to
○ Avoid common antipatterns
○ Simplify development experience
○ Improve resilience and performances
○ Enable experimentation

2007
Monolith
RoR application +
Postgres
2010
SOA
Broke up the
monolith in large
services
2012
µ-services
Incremental introduction
of µ-services (up to
~300)
2016
µ-services + ƛ
Introduction of
functions as a
service
ƛ ƛ
ƛ ƛ
2018 +
µ-services +
streams
Streaming platform
to share data among
services
ƛ ƛ
ƛ ƛ
From monolith to µ-services + streams

https://www.slideshare.net/InfoQ/
lambda-architectures-a-snapshot-a-stream-a-bunch-of-deltas
* Changes are propagated in real-time to Solr
* Rebuild of index (s + 횫*) with zero down time
* Same logic for batch stream (thank you
akka-streams)
* V.O.T.: “We needed a relational DB to solve a relationa
problem”
Search: A Snapshot, a Stream, a Bunch of Deltas
Kinesis
Calatrava
횫
S3
Brands, products,
sales, channels, ...
s
횫VOT
View of Truth - PG
svc-search
-feed
Source of Truth - PG
admin

Kafka Topics Anatomy and Log Compaction
0 1 2 3 4 5 6 7 8 9 10 11
k0 k1 k2 k1 k4 k5 k0 k7 k8 k9 k10 k10
foo bar baz qux quix conge grault garply waldo fred plugh xyzzy
Producers
Consumer A
(offset = 6)
Consumer B
(offset = 11)
OFFSET
KEY
VALUE
Topic T1 - Partition 0
2 3 4 5 6 7 8 9 11
k2 k1 k4 k5 k0 k7 k8 k9 k10
baz qux quix conge grault garply waldo fred xyzzy
OFFSET
KEY
VALUE
Topic T1 - Partition 0
(after log compaction)

The Kafka Ecosystem
● Connect: to copy data between Kafka and another system
○ Sources: import data (eg: Postgres, MySQL, S3, Kinesis, etc)
○ Sinks: export data (eg: Postgres, Elasticsearch, Solr, etc)
● Kafka Streams: client library for building mission-critical real-time
applications
● Schema Registry: metadata serving layer for storing and retrieving
AVRO schemas. Allows evolution of schemas
● KSQL: streaming SQL engine
● Kubernetes Operator: simplify provisioning and operational burden

Kafka Streams
● Cluster/Framework free tiny client library (=~ 800 KB)
● Elastic, highly scalable, fault-tolerant
● Deployable as a standard Java/Scala application
● Built-in abstractions for streams ↔ table duality
● Declarative functional DSL with support for
○ Transformations (eg: filter, map, flatMap)
○ Aggregations (eg: count, reduce, groupBy)
○ Joins (eg: leftJoin, outerJoin)
○ Windowing (session, sliding time)
● Internal key-value state store (in-memory or disk-backed based on
RocksDB) used for buffering, aggregations, interactive queries

Streaming Platform + Search + µ-services
product
inventory
pricing
OtherSystems
Streaming Platform
OtherSystems
Applications
Search App
Kafka Streams
Connect
Connect
web-pdp
µ-services
web-homepage
µ-services

The data dichotomy between monoliths and µ-services
Monolith Database
product-svc
inventory-svc
Database
Database
web-pdp
web-pdp
Interface amplifies data
Interface hides data

µ-services antipatterns 1/2: The God Service
A data service that grows exposing an increasing set of functions to the
point where it starts look like a homegrown database
○ getProduct(id)
○ getAllProducts(saleId)
○ getAllAvailableProducts(saleId)
○ getAllActiveProducts()
○ getSku(id)
○ getAllSkusByProduct()

µ-services antipatterns 2/2: REST-to-ETL problem
When it’s preferable to extract the data from a data service and keep it
local for different reasons:
● Aggregation: data needs to be combined with another dataset
● Caching: data needs to be closer to get better performances
● Ownership: the data services provide limited functionalities and can’t
be changed quickly enough

In the past we used caches to mitigate issues, but . . .
product-service inventory-service
web-pdp
commons lib
product cache
* Cache refresh every 20 min
* Fast response time (data locally
available)
* JSON from product service 1Gb
* Startup time 10m
* JVM GC every 20m on cache clear
* m4.xlarge, w/ 14Gb JVM Heap
Take 1: on heap caching
web-pdp
commons lib
* Startup Time in seconds
* No more stop-the-world GC
* c4.xlarge (CPU!!!), w/ 6Gb JVM Heap
* Performance degradation
product-service
inventory-service
Elasticacache
Take 2: centralized Elasticache
L1 CACHE

Solution: Kafka + Kafka Streams
(aka turning the DB inside out)
Commit Log
Indexes
Caching
Query Engine
Kafka Streams

web-pdp - streaming topology 1/2
val builder = new StreamsBuilder()
def inventoriesStreamToTable(): KTable[InventoryKey, Inventory] = {
implicit val consumedInv: Consumed[db.catalog.InventoryKey, db.catalog.Inventory] =
Consumed.`with`(inventoryTopicConfig.keySerde, inventoryTopicConfig.valueSerde)
builder.table(inventories)
}
def productStreamToTable(): KTable[ProductKey, catalog.Product] = {
implicit val consumedProducts: Consumed[db.catalog.ProductKey, db.catalog.Product] =
Consumed.`with`(productTopicConfig.keySerde, productTopicConfig.valueSerde)
builder.table(products)
}
val inventoriesTable = inventoriesStreamToTable()
val productsTable = productStreamToTable()
sku101 sku201 sku102 sku101
pId=p01
value=5
pId=p02
value=2
pId=p01
value=1
pId=p01
value=4
sku-id value
sku201 pId=p02
value=2
sku102 pId=p01
value=1
sku101 pId=p01
value=4
Inventory Topic
(kafka)
Inventory KTable
(web-pdp local store)
p01 p02 p03 p02
Red
Shoes
Blak
Dress
White
Scarf
Black
Dress
p-id value
p01 Red
Shoes
p03 White
Scarf
p02 Black
Dress
Products Topic
(kafka)
Product KTable

web-pdp - streaming topology 2/2
val inventoriesByProduct: KTable[ProductKey, Inventories] = inventoriesTable
.groupBy((inventoryKey, inventory) =
(ProductKey(inventoryKey.productId), inventory))
)
.aggregate[Inventories](
() = Inventories(List[Inventory]()),
(_: ProductKey, inv: Inventory, acc: Inventories) = Inventories(inv :: acc.items),
(_: ProductKey, inv: Inventory, acc: Inventories) = Inventories(acc.items.filter(_.skuId !=
inv.skuId))
)
def materializeView(product: Product, inventories: Inventories): ProductEnriched = ???
productsTable
.join(
inventoriesByProduct,
materializeView,
Materialized.as(product-enriched)
)
val kStreams = new KafkaStreams(builder.build(), streamsConfig)
kStreams.start()
sku-id value
sku201 pId=p02
value=2
sku102 pId=p01
value=1
sku101 pId=p01
value=4
Inventory KTable
p-id value
p02 [sku=sku201, value=2]
p01 [sku=sku102, value=1
sku=sku101, value=4]]
Inventory By Product KTable
p-id value
p02 Black Dress
[sku=sku201, value=2]
p01 Red Shoes
[sku=sku102, value=1
p03 White Scarf
products-enriched
p-id value
p01 Red
Shoes
p03 White
Scarf
p02 Black
Dress
Product KTable

web-pdp - Interactive queries
val store: ReadOnlyKeyValueStore[ProductKey, ProductEnriched] =
kStreams.store(product-enriched, QueryableStoreTypes.keyValueStore())
val productEnriched = store.get(new ProductKey(p01))
p-id value
p02 Black Dress
[sku=sku201, value=2]
p01 Red Shoes
[sku=sku102, value=1
p03 White Scarf
products-enriched
* Startup Time 2 min (single node)
* Really fast response time: data is
local and fully precomputed
* No dependencies to other services -
less things can go wrong
* Centralized monitoring / alerting

Summary
● Lambda architecture works well but the implementation is not trivial
● Stream processing introduces a new programming paradigm
● Use the schema registry from day 1 to support schema changes
compatibility and avoid to break downstream consumers
● A replayable log (Kafka) and a streaming library (Kafka Streams) give the
freedom to slice, dice, enrich and evolve data locally as it arrives
increasing resilience and performance

Resources
● Data on the Outside versus Data on the Inside [P. Helland - 2005]
● The Log: What every software engineer should know about real-time data's
unifying abstraction [J. Kreps - 2013]
● Questioning the Lambda Architecture [J. Kreps - 2014]
● Turning the database inside-out with Apache Samza [M. Kleppmann - 2015]
● The Data Dichotomy: Rethinking the Way We Treat Data and Services [B.
Stopford - 2016]
● Introducing Kafka Streams: Stream Processing Made Simple [J. Kreps - 2016]
● Building a µ-services ecosystem with Kafka Stream and KSQL [B. Stopford -
2017]
● Streams and Tables: Two Sides of the Same Coin [Sax - Weidlich - Wang - Freytag
- 2018]

Event Driven Microservices

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Event Driven Microservices

Similar to Event Driven Microservices (20)

Recently uploaded

Recently uploaded (20)

Event Driven Microservices