SlideShare a Scribd company logo
Event driven
µ-services
Rethinking Data and
Services with Streams
Dublin μServices User Group
27th September 2018
@fabriziofortino
About me
● Staff Engineer @HBCTech
● 15+ years of experience in
software development
● Open source enthusiast and
contributor
● @fabriziofortino
What this talk is about
● HBC Architecture Evolution
● Why Kafka?
● Kafka Overview
● Streaming Platform + Search + µ-services
● Use of Kafka Streams in a µ-services architecture to
○ Avoid common antipatterns
○ Simplify development experience
○ Improve resilience and performances
○ Enable experimentation
HBC: Stores + Online Banners
2007
Monolith
RoR application +
Postgres
2010
SOA
Broke up the
monolith in large
services
2012
µ-services
Incremental introduction
of µ-services (up to
~300)
2016
µ-services + ƛ
Introduction of
functions as a
service
ƛ ƛ
ƛ ƛ
2018 +
µ-services +
streams
Streaming platform
to share data among
services
ƛ ƛ
ƛ ƛ
From monolith to µ-services + streams
https://www.slideshare.net/InfoQ/
lambda-architectures-a-snapshot-a-stream-a-bunch-of-deltas
* Changes are propagated in real-time to Solr
* Rebuild of index (s + 횫*) with zero down time
* Same logic for batch  stream (thank you
akka-streams)
* V.O.T.: “We needed a relational DB to solve a relationa
problem”
Search: A Snapshot, a Stream,  a Bunch of Deltas
Kinesis
Calatrava
횫
S3
Brands, products,
sales, channels, ...
s
횫VOT
View of Truth - PG
svc-search
-feed
Source of Truth - PG
admin
Hello Kafka!
Kafka Topics Anatomy and Log Compaction
0 1 2 3 4 5 6 7 8 9 10 11
k0 k1 k2 k1 k4 k5 k0 k7 k8 k9 k10 k10
foo bar baz qux quix conge grault garply waldo fred plugh xyzzy
Producers
Consumer A
(offset = 6)
Consumer B
(offset = 11)
OFFSET
KEY
VALUE
Topic T1 - Partition 0
2 3 4 5 6 7 8 9 11
k2 k1 k4 k5 k0 k7 k8 k9 k10
baz qux quix conge grault garply waldo fred xyzzy
OFFSET
KEY
VALUE
Topic T1 - Partition 0
(after log compaction)
The Kafka Ecosystem
● Connect: to copy data between Kafka and another system
○ Sources: import data (eg: Postgres, MySQL, S3, Kinesis, etc)
○ Sinks: export data (eg: Postgres, Elasticsearch, Solr, etc)
● Kafka Streams: client library for building mission-critical real-time
applications
● Schema Registry: metadata serving layer for storing and retrieving
AVRO schemas. Allows evolution of schemas
● KSQL: streaming SQL engine
● Kubernetes Operator: simplify provisioning and operational burden
Kafka Streams
● Cluster/Framework free tiny client library (=~ 800 KB)
● Elastic, highly scalable, fault-tolerant
● Deployable as a standard Java/Scala application
● Built-in abstractions for streams ↔ table duality
● Declarative functional DSL with support for
○ Transformations (eg: filter, map, flatMap)
○ Aggregations (eg: count, reduce, groupBy)
○ Joins (eg: leftJoin, outerJoin)
○ Windowing (session, sliding time)
● Internal key-value state store (in-memory or disk-backed based on
RocksDB) used for buffering, aggregations, interactive queries
Streaming Platform + Search + µ-services
product
inventory
pricing
OtherSystems
Streaming Platform
OtherSystems
Applications
Search App
Kafka Streams
Connect
Connect
web-pdp
µ-services
web-homepage
µ-services
The data dichotomy between monoliths and µ-services
Monolith Database
product-svc
inventory-svc
Database
Database
web-pdp
web-pdp
Interface amplifies data
Interface hides data
µ-services antipatterns 1/2: The God Service
A data service that grows exposing an increasing set of functions to the
point where it starts look like a homegrown database
○ getProduct(id)
○ getAllProducts(saleId)
○ getAllAvailableProducts(saleId)
○ getAllActiveProducts()
○ getSku(id)
○ getAllSkusByProduct()
µ-services antipatterns 2/2: REST-to-ETL problem
When it’s preferable to extract the data from a data service and keep it
local for different reasons:
● Aggregation: data needs to be combined with another dataset
● Caching: data needs to be closer to get better performances
● Ownership: the data services provide limited functionalities and can’t
be changed quickly enough
In the past we used caches to mitigate issues, but . . .
product-service inventory-service
web-pdp
commons lib
product cache
* Cache refresh every 20 min
* Fast response time (data locally
available)
* JSON from product service  1Gb
* Startup time  10m
* JVM GC every 20m on cache clear
* m4.xlarge, w/ 14Gb JVM Heap
Take 1: on heap caching
web-pdp
commons lib
* Startup Time in seconds
* No more stop-the-world GC
* c4.xlarge (CPU!!!), w/ 6Gb JVM Heap
* Performance degradation
product-service
inventory-service
Elasticacache
Take 2: centralized Elasticache
L1 CACHE
Solution: Kafka + Kafka Streams
(aka turning the DB inside out)
Commit Log
Indexes
Caching
Query Engine
Kafka Streams
web-pdp - streaming topology 1/2
val builder = new StreamsBuilder()
def inventoriesStreamToTable(): KTable[InventoryKey, Inventory] = {
implicit val consumedInv: Consumed[db.catalog.InventoryKey, db.catalog.Inventory] =
Consumed.`with`(inventoryTopicConfig.keySerde, inventoryTopicConfig.valueSerde)
builder.table(inventories)
}
def productStreamToTable(): KTable[ProductKey, catalog.Product] = {
implicit val consumedProducts: Consumed[db.catalog.ProductKey, db.catalog.Product] =
Consumed.`with`(productTopicConfig.keySerde, productTopicConfig.valueSerde)
builder.table(products)
}
val inventoriesTable = inventoriesStreamToTable()
val productsTable = productStreamToTable()
sku101 sku201 sku102 sku101
pId=p01
value=5
pId=p02
value=2
pId=p01
value=1
pId=p01
value=4
sku-id value
sku201 pId=p02
value=2
sku102 pId=p01
value=1
sku101 pId=p01
value=4
Inventory Topic
(kafka)
Inventory KTable
(web-pdp local store)
p01 p02 p03 p02
Red
Shoes
Blak
Dress
White
Scarf
Black
Dress
p-id value
p01 Red
Shoes
p03 White
Scarf
p02 Black
Dress
Products Topic
(kafka)
Product KTable
(web-pdp local store)
web-pdp - streaming topology 2/2
val inventoriesByProduct: KTable[ProductKey, Inventories] = inventoriesTable
.groupBy((inventoryKey, inventory) =
(ProductKey(inventoryKey.productId), inventory))
)
.aggregate[Inventories](
() = Inventories(List[Inventory]()),
(_: ProductKey, inv: Inventory, acc: Inventories) = Inventories(inv :: acc.items),
(_: ProductKey, inv: Inventory, acc: Inventories) = Inventories(acc.items.filter(_.skuId !=
inv.skuId))
)
def materializeView(product: Product, inventories: Inventories): ProductEnriched = ???
productsTable
.join(
inventoriesByProduct,
materializeView,
Materialized.as(product-enriched)
)
val kStreams = new KafkaStreams(builder.build(), streamsConfig)
kStreams.start()
sku-id value
sku201 pId=p02
value=2
sku102 pId=p01
value=1
sku101 pId=p01
value=4
Inventory KTable
(web-pdp local store)
p-id value
p02 [sku=sku201, value=2]
p01 [sku=sku102, value=1
sku=sku101, value=4]]
Inventory By Product KTable
(web-pdp local store)
p-id value
p02 Black Dress
[sku=sku201, value=2]
p01 Red Shoes
[sku=sku102, value=1
sku=sku101, value=4]]
p03 White Scarf
products-enriched
(web-pdp local store)
p-id value
p01 Red
Shoes
p03 White
Scarf
p02 Black
Dress
Product KTable
(web-pdp local store)
web-pdp - Interactive queries
val store: ReadOnlyKeyValueStore[ProductKey, ProductEnriched] =
kStreams.store(product-enriched, QueryableStoreTypes.keyValueStore())
val productEnriched = store.get(new ProductKey(p01))
p-id value
p02 Black Dress
[sku=sku201, value=2]
p01 Red Shoes
[sku=sku102, value=1
sku=sku101, value=4]]
p03 White Scarf
products-enriched
(web-pdp local store)
* Startup Time  2 min (single node)
* Really fast response time: data is
local and fully precomputed
* No dependencies to other services -
less things can go wrong
* Centralized monitoring / alerting
Summary
● Lambda architecture works well but the implementation is not trivial
● Stream processing introduces a new programming paradigm
● Use the schema registry from day 1 to support schema changes
compatibility and avoid to break downstream consumers
● A replayable log (Kafka) and a streaming library (Kafka Streams) give the
freedom to slice, dice, enrich and evolve data locally as it arrives
increasing resilience and performance
Books
Resources
● Data on the Outside versus Data on the Inside [P. Helland - 2005]
● The Log: What every software engineer should know about real-time data's
unifying abstraction [J. Kreps - 2013]
● Questioning the Lambda Architecture [J. Kreps - 2014]
● Turning the database inside-out with Apache Samza [M. Kleppmann - 2015]
● The Data Dichotomy: Rethinking the Way We Treat Data and Services [B.
Stopford - 2016]
● Introducing Kafka Streams: Stream Processing Made Simple [J. Kreps - 2016]
● Building a µ-services ecosystem with Kafka Stream and KSQL [B. Stopford -
2017]
● Streams and Tables: Two Sides of the Same Coin [Sax - Weidlich - Wang - Freytag
- 2018]
Event Driven Microservices

More Related Content

What's hot

Introduction to the Processor API
Introduction to the Processor APIIntroduction to the Processor API
Introduction to the Processor APIconfluent
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingChen-en Lu
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
Introducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task AutomationIntroducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task AutomationScyllaDB
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafkaconfluent
 
Stream Application Development with Apache Kafka
Stream Application Development with Apache KafkaStream Application Development with Apache Kafka
Stream Application Development with Apache KafkaMatthias J. Sax
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19confluent
 
Seastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteSeastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteScyllaDB
 
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...Flink Forward
 
Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Jacob Maes
 
Transforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big DataTransforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big Dataplumbee
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsSATOSHI TAGOMORI
 
Journey and evolution of Presto@Grab
Journey and evolution of Presto@GrabJourney and evolution of Presto@Grab
Journey and evolution of Presto@GrabShubham Tagra
 
What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2MariaDB plc
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTim Ysewyn
 
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingYaroslav Tkachenko
 

What's hot (20)

Introduction to the Processor API
Introduction to the Processor APIIntroduction to the Processor API
Introduction to the Processor API
 
KDB+ Lite
KDB+ LiteKDB+ Lite
KDB+ Lite
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Introducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task AutomationIntroducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task Automation
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafka
 
Stream Application Development with Apache Kafka
Stream Application Development with Apache KafkaStream Application Development with Apache Kafka
Stream Application Development with Apache Kafka
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
 
Seastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteSeastar Summit 2019 Keynote
Seastar Summit 2019 Keynote
 
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
 
Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017
 
Transforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big DataTransforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big Data
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Javantura v3 - Logs – the missing gold mine – Franjo Žilić
Javantura v3 - Logs – the missing gold mine – Franjo ŽilićJavantura v3 - Logs – the missing gold mine – Franjo Žilić
Javantura v3 - Logs – the missing gold mine – Franjo Žilić
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench Tools
 
Journey and evolution of Presto@Grab
Journey and evolution of Presto@GrabJourney and evolution of Presto@Grab
Journey and evolution of Presto@Grab
 
What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 

Similar to Event Driven Microservices

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...HostedbyConfluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingYaroslav Tkachenko
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafkaconfluent
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineMonal Daxini
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebookAniket Mokashi
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafkaSamuel Kerrien
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly SolarWinds Loggly
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with storesYoni Farin
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Miguel Pérez Colino
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
How to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackHow to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackgjdevos
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationYi Pan
 
Kafka Streams - From the Ground Up to the Cloud
Kafka Streams - From the Ground Up to the CloudKafka Streams - From the Ground Up to the Cloud
Kafka Streams - From the Ground Up to the CloudVMware Tanzu
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 

Similar to Event Driven Microservices (20)

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
How to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackHow to run a bank on Apache CloudStack
How to run a bank on Apache CloudStack
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
 
Kafka Streams - From the Ground Up to the Cloud
Kafka Streams - From the Ground Up to the CloudKafka Streams - From the Ground Up to the Cloud
Kafka Streams - From the Ground Up to the Cloud
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 

Recently uploaded

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)Max Lee
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Soroosh Khodami
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfFurqanuddin10
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfMehmet Akar
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfDeskTrack
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesNeo4j
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1KnowledgeSeed
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfkalichargn70th171
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Andrea Goulet
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdfkalichargn70th171
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...rajkumar669520
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabbereGrabber
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Gáspár Nagy
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAlluxio, Inc.
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockSkilrock Technologies
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAlluxio, Inc.
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...Alluxio, Inc.
 

Recently uploaded (20)

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 

Event Driven Microservices

  • 1. Event driven µ-services Rethinking Data and Services with Streams Dublin μServices User Group 27th September 2018 @fabriziofortino
  • 2. About me ● Staff Engineer @HBCTech ● 15+ years of experience in software development ● Open source enthusiast and contributor ● @fabriziofortino
  • 3. What this talk is about ● HBC Architecture Evolution ● Why Kafka? ● Kafka Overview ● Streaming Platform + Search + µ-services ● Use of Kafka Streams in a µ-services architecture to ○ Avoid common antipatterns ○ Simplify development experience ○ Improve resilience and performances ○ Enable experimentation
  • 4. HBC: Stores + Online Banners
  • 5. 2007 Monolith RoR application + Postgres 2010 SOA Broke up the monolith in large services 2012 µ-services Incremental introduction of µ-services (up to ~300) 2016 µ-services + ƛ Introduction of functions as a service ƛ ƛ ƛ ƛ 2018 + µ-services + streams Streaming platform to share data among services ƛ ƛ ƛ ƛ From monolith to µ-services + streams
  • 6. https://www.slideshare.net/InfoQ/ lambda-architectures-a-snapshot-a-stream-a-bunch-of-deltas * Changes are propagated in real-time to Solr * Rebuild of index (s + 횫*) with zero down time * Same logic for batch stream (thank you akka-streams) * V.O.T.: “We needed a relational DB to solve a relationa problem” Search: A Snapshot, a Stream, a Bunch of Deltas Kinesis Calatrava 횫 S3 Brands, products, sales, channels, ... s 횫VOT View of Truth - PG svc-search -feed Source of Truth - PG admin
  • 7.
  • 8.
  • 10. Kafka Topics Anatomy and Log Compaction 0 1 2 3 4 5 6 7 8 9 10 11 k0 k1 k2 k1 k4 k5 k0 k7 k8 k9 k10 k10 foo bar baz qux quix conge grault garply waldo fred plugh xyzzy Producers Consumer A (offset = 6) Consumer B (offset = 11) OFFSET KEY VALUE Topic T1 - Partition 0 2 3 4 5 6 7 8 9 11 k2 k1 k4 k5 k0 k7 k8 k9 k10 baz qux quix conge grault garply waldo fred xyzzy OFFSET KEY VALUE Topic T1 - Partition 0 (after log compaction)
  • 11.
  • 12. The Kafka Ecosystem ● Connect: to copy data between Kafka and another system ○ Sources: import data (eg: Postgres, MySQL, S3, Kinesis, etc) ○ Sinks: export data (eg: Postgres, Elasticsearch, Solr, etc) ● Kafka Streams: client library for building mission-critical real-time applications ● Schema Registry: metadata serving layer for storing and retrieving AVRO schemas. Allows evolution of schemas ● KSQL: streaming SQL engine ● Kubernetes Operator: simplify provisioning and operational burden
  • 13. Kafka Streams ● Cluster/Framework free tiny client library (=~ 800 KB) ● Elastic, highly scalable, fault-tolerant ● Deployable as a standard Java/Scala application ● Built-in abstractions for streams ↔ table duality ● Declarative functional DSL with support for ○ Transformations (eg: filter, map, flatMap) ○ Aggregations (eg: count, reduce, groupBy) ○ Joins (eg: leftJoin, outerJoin) ○ Windowing (session, sliding time) ● Internal key-value state store (in-memory or disk-backed based on RocksDB) used for buffering, aggregations, interactive queries
  • 14. Streaming Platform + Search + µ-services product inventory pricing OtherSystems Streaming Platform OtherSystems Applications Search App Kafka Streams Connect Connect web-pdp µ-services web-homepage µ-services
  • 15. The data dichotomy between monoliths and µ-services Monolith Database product-svc inventory-svc Database Database web-pdp web-pdp Interface amplifies data Interface hides data
  • 16. µ-services antipatterns 1/2: The God Service A data service that grows exposing an increasing set of functions to the point where it starts look like a homegrown database ○ getProduct(id) ○ getAllProducts(saleId) ○ getAllAvailableProducts(saleId) ○ getAllActiveProducts() ○ getSku(id) ○ getAllSkusByProduct()
  • 17. µ-services antipatterns 2/2: REST-to-ETL problem When it’s preferable to extract the data from a data service and keep it local for different reasons: ● Aggregation: data needs to be combined with another dataset ● Caching: data needs to be closer to get better performances ● Ownership: the data services provide limited functionalities and can’t be changed quickly enough
  • 18. In the past we used caches to mitigate issues, but . . . product-service inventory-service web-pdp commons lib product cache * Cache refresh every 20 min * Fast response time (data locally available) * JSON from product service 1Gb * Startup time 10m * JVM GC every 20m on cache clear * m4.xlarge, w/ 14Gb JVM Heap Take 1: on heap caching web-pdp commons lib * Startup Time in seconds * No more stop-the-world GC * c4.xlarge (CPU!!!), w/ 6Gb JVM Heap * Performance degradation product-service inventory-service Elasticacache Take 2: centralized Elasticache L1 CACHE
  • 19. Solution: Kafka + Kafka Streams (aka turning the DB inside out) Commit Log Indexes Caching Query Engine Kafka Streams
  • 20. web-pdp - streaming topology 1/2 val builder = new StreamsBuilder() def inventoriesStreamToTable(): KTable[InventoryKey, Inventory] = { implicit val consumedInv: Consumed[db.catalog.InventoryKey, db.catalog.Inventory] = Consumed.`with`(inventoryTopicConfig.keySerde, inventoryTopicConfig.valueSerde) builder.table(inventories) } def productStreamToTable(): KTable[ProductKey, catalog.Product] = { implicit val consumedProducts: Consumed[db.catalog.ProductKey, db.catalog.Product] = Consumed.`with`(productTopicConfig.keySerde, productTopicConfig.valueSerde) builder.table(products) } val inventoriesTable = inventoriesStreamToTable() val productsTable = productStreamToTable() sku101 sku201 sku102 sku101 pId=p01 value=5 pId=p02 value=2 pId=p01 value=1 pId=p01 value=4 sku-id value sku201 pId=p02 value=2 sku102 pId=p01 value=1 sku101 pId=p01 value=4 Inventory Topic (kafka) Inventory KTable (web-pdp local store) p01 p02 p03 p02 Red Shoes Blak Dress White Scarf Black Dress p-id value p01 Red Shoes p03 White Scarf p02 Black Dress Products Topic (kafka) Product KTable (web-pdp local store)
  • 21. web-pdp - streaming topology 2/2 val inventoriesByProduct: KTable[ProductKey, Inventories] = inventoriesTable .groupBy((inventoryKey, inventory) = (ProductKey(inventoryKey.productId), inventory)) ) .aggregate[Inventories]( () = Inventories(List[Inventory]()), (_: ProductKey, inv: Inventory, acc: Inventories) = Inventories(inv :: acc.items), (_: ProductKey, inv: Inventory, acc: Inventories) = Inventories(acc.items.filter(_.skuId != inv.skuId)) ) def materializeView(product: Product, inventories: Inventories): ProductEnriched = ??? productsTable .join( inventoriesByProduct, materializeView, Materialized.as(product-enriched) ) val kStreams = new KafkaStreams(builder.build(), streamsConfig) kStreams.start() sku-id value sku201 pId=p02 value=2 sku102 pId=p01 value=1 sku101 pId=p01 value=4 Inventory KTable (web-pdp local store) p-id value p02 [sku=sku201, value=2] p01 [sku=sku102, value=1 sku=sku101, value=4]] Inventory By Product KTable (web-pdp local store) p-id value p02 Black Dress [sku=sku201, value=2] p01 Red Shoes [sku=sku102, value=1 sku=sku101, value=4]] p03 White Scarf products-enriched (web-pdp local store) p-id value p01 Red Shoes p03 White Scarf p02 Black Dress Product KTable (web-pdp local store)
  • 22. web-pdp - Interactive queries val store: ReadOnlyKeyValueStore[ProductKey, ProductEnriched] = kStreams.store(product-enriched, QueryableStoreTypes.keyValueStore()) val productEnriched = store.get(new ProductKey(p01)) p-id value p02 Black Dress [sku=sku201, value=2] p01 Red Shoes [sku=sku102, value=1 sku=sku101, value=4]] p03 White Scarf products-enriched (web-pdp local store) * Startup Time 2 min (single node) * Really fast response time: data is local and fully precomputed * No dependencies to other services - less things can go wrong * Centralized monitoring / alerting
  • 23. Summary ● Lambda architecture works well but the implementation is not trivial ● Stream processing introduces a new programming paradigm ● Use the schema registry from day 1 to support schema changes compatibility and avoid to break downstream consumers ● A replayable log (Kafka) and a streaming library (Kafka Streams) give the freedom to slice, dice, enrich and evolve data locally as it arrives increasing resilience and performance
  • 24. Books
  • 25. Resources ● Data on the Outside versus Data on the Inside [P. Helland - 2005] ● The Log: What every software engineer should know about real-time data's unifying abstraction [J. Kreps - 2013] ● Questioning the Lambda Architecture [J. Kreps - 2014] ● Turning the database inside-out with Apache Samza [M. Kleppmann - 2015] ● The Data Dichotomy: Rethinking the Way We Treat Data and Services [B. Stopford - 2016] ● Introducing Kafka Streams: Stream Processing Made Simple [J. Kreps - 2016] ● Building a µ-services ecosystem with Kafka Stream and KSQL [B. Stopford - 2017] ● Streams and Tables: Two Sides of the Same Coin [Sax - Weidlich - Wang - Freytag - 2018]