SlideShare a Scribd company logo
1 of 26
Download to read offline
Event driven
µ-services
Rethinking Data and
Services with Streams
Dublin μServices User Group
27th September 2018
@fabriziofortino
About me
● Staff Engineer @HBCTech
● 15+ years of experience in
software development
● Open source enthusiast and
contributor
● @fabriziofortino
What this talk is about
● HBC Architecture Evolution
● Why Kafka?
● Kafka Overview
● Streaming Platform + Search + µ-services
● Use of Kafka Streams in a µ-services architecture to
○ Avoid common antipatterns
○ Simplify development experience
○ Improve resilience and performances
○ Enable experimentation
HBC: Stores + Online Banners
2007
Monolith
RoR application +
Postgres
2010
SOA
Broke up the
monolith in large
services
2012
µ-services
Incremental introduction
of µ-services (up to
~300)
2016
µ-services + ƛ
Introduction of
functions as a
service
ƛ ƛ
ƛ ƛ
2018 +
µ-services +
streams
Streaming platform
to share data among
services
ƛ ƛ
ƛ ƛ
From monolith to µ-services + streams
https://www.slideshare.net/InfoQ/
lambda-architectures-a-snapshot-a-stream-a-bunch-of-deltas
* Changes are propagated in real-time to Solr
* Rebuild of index (s + 횫*) with zero down time
* Same logic for batch  stream (thank you
akka-streams)
* V.O.T.: “We needed a relational DB to solve a relationa
problem”
Search: A Snapshot, a Stream,  a Bunch of Deltas
Kinesis
Calatrava
횫
S3
Brands, products,
sales, channels, ...
s
횫VOT
View of Truth - PG
svc-search
-feed
Source of Truth - PG
admin
Hello Kafka!
Kafka Topics Anatomy and Log Compaction
0 1 2 3 4 5 6 7 8 9 10 11
k0 k1 k2 k1 k4 k5 k0 k7 k8 k9 k10 k10
foo bar baz qux quix conge grault garply waldo fred plugh xyzzy
Producers
Consumer A
(offset = 6)
Consumer B
(offset = 11)
OFFSET
KEY
VALUE
Topic T1 - Partition 0
2 3 4 5 6 7 8 9 11
k2 k1 k4 k5 k0 k7 k8 k9 k10
baz qux quix conge grault garply waldo fred xyzzy
OFFSET
KEY
VALUE
Topic T1 - Partition 0
(after log compaction)
The Kafka Ecosystem
● Connect: to copy data between Kafka and another system
○ Sources: import data (eg: Postgres, MySQL, S3, Kinesis, etc)
○ Sinks: export data (eg: Postgres, Elasticsearch, Solr, etc)
● Kafka Streams: client library for building mission-critical real-time
applications
● Schema Registry: metadata serving layer for storing and retrieving
AVRO schemas. Allows evolution of schemas
● KSQL: streaming SQL engine
● Kubernetes Operator: simplify provisioning and operational burden
Kafka Streams
● Cluster/Framework free tiny client library (=~ 800 KB)
● Elastic, highly scalable, fault-tolerant
● Deployable as a standard Java/Scala application
● Built-in abstractions for streams ↔ table duality
● Declarative functional DSL with support for
○ Transformations (eg: filter, map, flatMap)
○ Aggregations (eg: count, reduce, groupBy)
○ Joins (eg: leftJoin, outerJoin)
○ Windowing (session, sliding time)
● Internal key-value state store (in-memory or disk-backed based on
RocksDB) used for buffering, aggregations, interactive queries
Streaming Platform + Search + µ-services
product
inventory
pricing
OtherSystems
Streaming Platform
OtherSystems
Applications
Search App
Kafka Streams
Connect
Connect
web-pdp
µ-services
web-homepage
µ-services
The data dichotomy between monoliths and µ-services
Monolith Database
product-svc
inventory-svc
Database
Database
web-pdp
web-pdp
Interface amplifies data
Interface hides data
µ-services antipatterns 1/2: The God Service
A data service that grows exposing an increasing set of functions to the
point where it starts look like a homegrown database
○ getProduct(id)
○ getAllProducts(saleId)
○ getAllAvailableProducts(saleId)
○ getAllActiveProducts()
○ getSku(id)
○ getAllSkusByProduct()
µ-services antipatterns 2/2: REST-to-ETL problem
When it’s preferable to extract the data from a data service and keep it
local for different reasons:
● Aggregation: data needs to be combined with another dataset
● Caching: data needs to be closer to get better performances
● Ownership: the data services provide limited functionalities and can’t
be changed quickly enough
In the past we used caches to mitigate issues, but . . .
product-service inventory-service
web-pdp
commons lib
product cache
* Cache refresh every 20 min
* Fast response time (data locally
available)
* JSON from product service  1Gb
* Startup time  10m
* JVM GC every 20m on cache clear
* m4.xlarge, w/ 14Gb JVM Heap
Take 1: on heap caching
web-pdp
commons lib
* Startup Time in seconds
* No more stop-the-world GC
* c4.xlarge (CPU!!!), w/ 6Gb JVM Heap
* Performance degradation
product-service
inventory-service
Elasticacache
Take 2: centralized Elasticache
L1 CACHE
Solution: Kafka + Kafka Streams
(aka turning the DB inside out)
Commit Log
Indexes
Caching
Query Engine
Kafka Streams
web-pdp - streaming topology 1/2
val builder = new StreamsBuilder()
def inventoriesStreamToTable(): KTable[InventoryKey, Inventory] = {
implicit val consumedInv: Consumed[db.catalog.InventoryKey, db.catalog.Inventory] =
Consumed.`with`(inventoryTopicConfig.keySerde, inventoryTopicConfig.valueSerde)
builder.table(inventories)
}
def productStreamToTable(): KTable[ProductKey, catalog.Product] = {
implicit val consumedProducts: Consumed[db.catalog.ProductKey, db.catalog.Product] =
Consumed.`with`(productTopicConfig.keySerde, productTopicConfig.valueSerde)
builder.table(products)
}
val inventoriesTable = inventoriesStreamToTable()
val productsTable = productStreamToTable()
sku101 sku201 sku102 sku101
pId=p01
value=5
pId=p02
value=2
pId=p01
value=1
pId=p01
value=4
sku-id value
sku201 pId=p02
value=2
sku102 pId=p01
value=1
sku101 pId=p01
value=4
Inventory Topic
(kafka)
Inventory KTable
(web-pdp local store)
p01 p02 p03 p02
Red
Shoes
Blak
Dress
White
Scarf
Black
Dress
p-id value
p01 Red
Shoes
p03 White
Scarf
p02 Black
Dress
Products Topic
(kafka)
Product KTable
(web-pdp local store)
web-pdp - streaming topology 2/2
val inventoriesByProduct: KTable[ProductKey, Inventories] = inventoriesTable
.groupBy((inventoryKey, inventory) =
(ProductKey(inventoryKey.productId), inventory))
)
.aggregate[Inventories](
() = Inventories(List[Inventory]()),
(_: ProductKey, inv: Inventory, acc: Inventories) = Inventories(inv :: acc.items),
(_: ProductKey, inv: Inventory, acc: Inventories) = Inventories(acc.items.filter(_.skuId !=
inv.skuId))
)
def materializeView(product: Product, inventories: Inventories): ProductEnriched = ???
productsTable
.join(
inventoriesByProduct,
materializeView,
Materialized.as(product-enriched)
)
val kStreams = new KafkaStreams(builder.build(), streamsConfig)
kStreams.start()
sku-id value
sku201 pId=p02
value=2
sku102 pId=p01
value=1
sku101 pId=p01
value=4
Inventory KTable
(web-pdp local store)
p-id value
p02 [sku=sku201, value=2]
p01 [sku=sku102, value=1
sku=sku101, value=4]]
Inventory By Product KTable
(web-pdp local store)
p-id value
p02 Black Dress
[sku=sku201, value=2]
p01 Red Shoes
[sku=sku102, value=1
sku=sku101, value=4]]
p03 White Scarf
products-enriched
(web-pdp local store)
p-id value
p01 Red
Shoes
p03 White
Scarf
p02 Black
Dress
Product KTable
(web-pdp local store)
web-pdp - Interactive queries
val store: ReadOnlyKeyValueStore[ProductKey, ProductEnriched] =
kStreams.store(product-enriched, QueryableStoreTypes.keyValueStore())
val productEnriched = store.get(new ProductKey(p01))
p-id value
p02 Black Dress
[sku=sku201, value=2]
p01 Red Shoes
[sku=sku102, value=1
sku=sku101, value=4]]
p03 White Scarf
products-enriched
(web-pdp local store)
* Startup Time  2 min (single node)
* Really fast response time: data is
local and fully precomputed
* No dependencies to other services -
less things can go wrong
* Centralized monitoring / alerting
Summary
● Lambda architecture works well but the implementation is not trivial
● Stream processing introduces a new programming paradigm
● Use the schema registry from day 1 to support schema changes
compatibility and avoid to break downstream consumers
● A replayable log (Kafka) and a streaming library (Kafka Streams) give the
freedom to slice, dice, enrich and evolve data locally as it arrives
increasing resilience and performance
Books
Resources
● Data on the Outside versus Data on the Inside [P. Helland - 2005]
● The Log: What every software engineer should know about real-time data's
unifying abstraction [J. Kreps - 2013]
● Questioning the Lambda Architecture [J. Kreps - 2014]
● Turning the database inside-out with Apache Samza [M. Kleppmann - 2015]
● The Data Dichotomy: Rethinking the Way We Treat Data and Services [B.
Stopford - 2016]
● Introducing Kafka Streams: Stream Processing Made Simple [J. Kreps - 2016]
● Building a µ-services ecosystem with Kafka Stream and KSQL [B. Stopford -
2017]
● Streams and Tables: Two Sides of the Same Coin [Sax - Weidlich - Wang - Freytag
- 2018]
Event driven μ-services and streaming data architectures

More Related Content

What's hot

Introduction to the Processor API
Introduction to the Processor APIIntroduction to the Processor API
Introduction to the Processor APIconfluent
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingChen-en Lu
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
Introducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task AutomationIntroducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task AutomationScyllaDB
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafkaconfluent
 
Stream Application Development with Apache Kafka
Stream Application Development with Apache KafkaStream Application Development with Apache Kafka
Stream Application Development with Apache KafkaMatthias J. Sax
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19confluent
 
Seastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteSeastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteScyllaDB
 
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...Flink Forward
 
Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Jacob Maes
 
Transforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big DataTransforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big Dataplumbee
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsSATOSHI TAGOMORI
 
Journey and evolution of Presto@Grab
Journey and evolution of Presto@GrabJourney and evolution of Presto@Grab
Journey and evolution of Presto@GrabShubham Tagra
 
What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2MariaDB plc
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTim Ysewyn
 
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingYaroslav Tkachenko
 

What's hot (20)

Introduction to the Processor API
Introduction to the Processor APIIntroduction to the Processor API
Introduction to the Processor API
 
KDB+ Lite
KDB+ LiteKDB+ Lite
KDB+ Lite
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Introducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task AutomationIntroducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task Automation
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafka
 
Stream Application Development with Apache Kafka
Stream Application Development with Apache KafkaStream Application Development with Apache Kafka
Stream Application Development with Apache Kafka
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
 
Seastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteSeastar Summit 2019 Keynote
Seastar Summit 2019 Keynote
 
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
 
Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017
 
Transforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big DataTransforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big Data
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Javantura v3 - Logs – the missing gold mine – Franjo Žilić
Javantura v3 - Logs – the missing gold mine – Franjo ŽilićJavantura v3 - Logs – the missing gold mine – Franjo Žilić
Javantura v3 - Logs – the missing gold mine – Franjo Žilić
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench Tools
 
Journey and evolution of Presto@Grab
Journey and evolution of Presto@GrabJourney and evolution of Presto@Grab
Journey and evolution of Presto@Grab
 
What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 

Similar to Event driven μ-services and streaming data architectures

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingYaroslav Tkachenko
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...HostedbyConfluent
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafkaconfluent
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineMonal Daxini
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebookAniket Mokashi
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafkaSamuel Kerrien
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly SolarWinds Loggly
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with storesYoni Farin
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Miguel Pérez Colino
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
How to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackHow to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackgjdevos
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationYi Pan
 
Kafka Streams - From the Ground Up to the Cloud
Kafka Streams - From the Ground Up to the CloudKafka Streams - From the Ground Up to the Cloud
Kafka Streams - From the Ground Up to the CloudVMware Tanzu
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 

Similar to Event driven μ-services and streaming data architectures (20)

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
How to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackHow to run a bank on Apache CloudStack
How to run a bank on Apache CloudStack
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
 
Kafka Streams - From the Ground Up to the Cloud
Kafka Streams - From the Ground Up to the CloudKafka Streams - From the Ground Up to the Cloud
Kafka Streams - From the Ground Up to the Cloud
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 

Recently uploaded

Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...kalichargn70th171
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxRTS corp
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxSasikiranMarri
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Effort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsEffort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsDEEPRAJ PATHAK
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxAS Design & AST.
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 

Recently uploaded (20)

Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptx
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Effort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsEffort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software Projects
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptx
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 

Event driven μ-services and streaming data architectures

  • 1. Event driven µ-services Rethinking Data and Services with Streams Dublin μServices User Group 27th September 2018 @fabriziofortino
  • 2. About me ● Staff Engineer @HBCTech ● 15+ years of experience in software development ● Open source enthusiast and contributor ● @fabriziofortino
  • 3. What this talk is about ● HBC Architecture Evolution ● Why Kafka? ● Kafka Overview ● Streaming Platform + Search + µ-services ● Use of Kafka Streams in a µ-services architecture to ○ Avoid common antipatterns ○ Simplify development experience ○ Improve resilience and performances ○ Enable experimentation
  • 4. HBC: Stores + Online Banners
  • 5. 2007 Monolith RoR application + Postgres 2010 SOA Broke up the monolith in large services 2012 µ-services Incremental introduction of µ-services (up to ~300) 2016 µ-services + ƛ Introduction of functions as a service ƛ ƛ ƛ ƛ 2018 + µ-services + streams Streaming platform to share data among services ƛ ƛ ƛ ƛ From monolith to µ-services + streams
  • 6. https://www.slideshare.net/InfoQ/ lambda-architectures-a-snapshot-a-stream-a-bunch-of-deltas * Changes are propagated in real-time to Solr * Rebuild of index (s + 횫*) with zero down time * Same logic for batch stream (thank you akka-streams) * V.O.T.: “We needed a relational DB to solve a relationa problem” Search: A Snapshot, a Stream, a Bunch of Deltas Kinesis Calatrava 횫 S3 Brands, products, sales, channels, ... s 횫VOT View of Truth - PG svc-search -feed Source of Truth - PG admin
  • 7.
  • 8.
  • 10. Kafka Topics Anatomy and Log Compaction 0 1 2 3 4 5 6 7 8 9 10 11 k0 k1 k2 k1 k4 k5 k0 k7 k8 k9 k10 k10 foo bar baz qux quix conge grault garply waldo fred plugh xyzzy Producers Consumer A (offset = 6) Consumer B (offset = 11) OFFSET KEY VALUE Topic T1 - Partition 0 2 3 4 5 6 7 8 9 11 k2 k1 k4 k5 k0 k7 k8 k9 k10 baz qux quix conge grault garply waldo fred xyzzy OFFSET KEY VALUE Topic T1 - Partition 0 (after log compaction)
  • 11.
  • 12. The Kafka Ecosystem ● Connect: to copy data between Kafka and another system ○ Sources: import data (eg: Postgres, MySQL, S3, Kinesis, etc) ○ Sinks: export data (eg: Postgres, Elasticsearch, Solr, etc) ● Kafka Streams: client library for building mission-critical real-time applications ● Schema Registry: metadata serving layer for storing and retrieving AVRO schemas. Allows evolution of schemas ● KSQL: streaming SQL engine ● Kubernetes Operator: simplify provisioning and operational burden
  • 13. Kafka Streams ● Cluster/Framework free tiny client library (=~ 800 KB) ● Elastic, highly scalable, fault-tolerant ● Deployable as a standard Java/Scala application ● Built-in abstractions for streams ↔ table duality ● Declarative functional DSL with support for ○ Transformations (eg: filter, map, flatMap) ○ Aggregations (eg: count, reduce, groupBy) ○ Joins (eg: leftJoin, outerJoin) ○ Windowing (session, sliding time) ● Internal key-value state store (in-memory or disk-backed based on RocksDB) used for buffering, aggregations, interactive queries
  • 14. Streaming Platform + Search + µ-services product inventory pricing OtherSystems Streaming Platform OtherSystems Applications Search App Kafka Streams Connect Connect web-pdp µ-services web-homepage µ-services
  • 15. The data dichotomy between monoliths and µ-services Monolith Database product-svc inventory-svc Database Database web-pdp web-pdp Interface amplifies data Interface hides data
  • 16. µ-services antipatterns 1/2: The God Service A data service that grows exposing an increasing set of functions to the point where it starts look like a homegrown database ○ getProduct(id) ○ getAllProducts(saleId) ○ getAllAvailableProducts(saleId) ○ getAllActiveProducts() ○ getSku(id) ○ getAllSkusByProduct()
  • 17. µ-services antipatterns 2/2: REST-to-ETL problem When it’s preferable to extract the data from a data service and keep it local for different reasons: ● Aggregation: data needs to be combined with another dataset ● Caching: data needs to be closer to get better performances ● Ownership: the data services provide limited functionalities and can’t be changed quickly enough
  • 18. In the past we used caches to mitigate issues, but . . . product-service inventory-service web-pdp commons lib product cache * Cache refresh every 20 min * Fast response time (data locally available) * JSON from product service 1Gb * Startup time 10m * JVM GC every 20m on cache clear * m4.xlarge, w/ 14Gb JVM Heap Take 1: on heap caching web-pdp commons lib * Startup Time in seconds * No more stop-the-world GC * c4.xlarge (CPU!!!), w/ 6Gb JVM Heap * Performance degradation product-service inventory-service Elasticacache Take 2: centralized Elasticache L1 CACHE
  • 19. Solution: Kafka + Kafka Streams (aka turning the DB inside out) Commit Log Indexes Caching Query Engine Kafka Streams
  • 20. web-pdp - streaming topology 1/2 val builder = new StreamsBuilder() def inventoriesStreamToTable(): KTable[InventoryKey, Inventory] = { implicit val consumedInv: Consumed[db.catalog.InventoryKey, db.catalog.Inventory] = Consumed.`with`(inventoryTopicConfig.keySerde, inventoryTopicConfig.valueSerde) builder.table(inventories) } def productStreamToTable(): KTable[ProductKey, catalog.Product] = { implicit val consumedProducts: Consumed[db.catalog.ProductKey, db.catalog.Product] = Consumed.`with`(productTopicConfig.keySerde, productTopicConfig.valueSerde) builder.table(products) } val inventoriesTable = inventoriesStreamToTable() val productsTable = productStreamToTable() sku101 sku201 sku102 sku101 pId=p01 value=5 pId=p02 value=2 pId=p01 value=1 pId=p01 value=4 sku-id value sku201 pId=p02 value=2 sku102 pId=p01 value=1 sku101 pId=p01 value=4 Inventory Topic (kafka) Inventory KTable (web-pdp local store) p01 p02 p03 p02 Red Shoes Blak Dress White Scarf Black Dress p-id value p01 Red Shoes p03 White Scarf p02 Black Dress Products Topic (kafka) Product KTable (web-pdp local store)
  • 21. web-pdp - streaming topology 2/2 val inventoriesByProduct: KTable[ProductKey, Inventories] = inventoriesTable .groupBy((inventoryKey, inventory) = (ProductKey(inventoryKey.productId), inventory)) ) .aggregate[Inventories]( () = Inventories(List[Inventory]()), (_: ProductKey, inv: Inventory, acc: Inventories) = Inventories(inv :: acc.items), (_: ProductKey, inv: Inventory, acc: Inventories) = Inventories(acc.items.filter(_.skuId != inv.skuId)) ) def materializeView(product: Product, inventories: Inventories): ProductEnriched = ??? productsTable .join( inventoriesByProduct, materializeView, Materialized.as(product-enriched) ) val kStreams = new KafkaStreams(builder.build(), streamsConfig) kStreams.start() sku-id value sku201 pId=p02 value=2 sku102 pId=p01 value=1 sku101 pId=p01 value=4 Inventory KTable (web-pdp local store) p-id value p02 [sku=sku201, value=2] p01 [sku=sku102, value=1 sku=sku101, value=4]] Inventory By Product KTable (web-pdp local store) p-id value p02 Black Dress [sku=sku201, value=2] p01 Red Shoes [sku=sku102, value=1 sku=sku101, value=4]] p03 White Scarf products-enriched (web-pdp local store) p-id value p01 Red Shoes p03 White Scarf p02 Black Dress Product KTable (web-pdp local store)
  • 22. web-pdp - Interactive queries val store: ReadOnlyKeyValueStore[ProductKey, ProductEnriched] = kStreams.store(product-enriched, QueryableStoreTypes.keyValueStore()) val productEnriched = store.get(new ProductKey(p01)) p-id value p02 Black Dress [sku=sku201, value=2] p01 Red Shoes [sku=sku102, value=1 sku=sku101, value=4]] p03 White Scarf products-enriched (web-pdp local store) * Startup Time 2 min (single node) * Really fast response time: data is local and fully precomputed * No dependencies to other services - less things can go wrong * Centralized monitoring / alerting
  • 23. Summary ● Lambda architecture works well but the implementation is not trivial ● Stream processing introduces a new programming paradigm ● Use the schema registry from day 1 to support schema changes compatibility and avoid to break downstream consumers ● A replayable log (Kafka) and a streaming library (Kafka Streams) give the freedom to slice, dice, enrich and evolve data locally as it arrives increasing resilience and performance
  • 24. Books
  • 25. Resources ● Data on the Outside versus Data on the Inside [P. Helland - 2005] ● The Log: What every software engineer should know about real-time data's unifying abstraction [J. Kreps - 2013] ● Questioning the Lambda Architecture [J. Kreps - 2014] ● Turning the database inside-out with Apache Samza [M. Kleppmann - 2015] ● The Data Dichotomy: Rethinking the Way We Treat Data and Services [B. Stopford - 2016] ● Introducing Kafka Streams: Stream Processing Made Simple [J. Kreps - 2016] ● Building a µ-services ecosystem with Kafka Stream and KSQL [B. Stopford - 2017] ● Streams and Tables: Two Sides of the Same Coin [Sax - Weidlich - Wang - Freytag - 2018]