SlideShare a Scribd company logo
1 of 30
From a Kafkaesque Story to
the Promised Land

7/7/2013
Ran Silberman
Open Source paradigm
The Cathedral & the Bazaar by Eric S Raymond, 1999
the struggle between top-down and bottom-up design
Challenges of data platform[1]

•
•
•
•
•
•
•
•

High throughput
Horizontal scale to address growth
High availability of data services
No Data loss

Satisfy Real-Time demands
Enforce structural data with schemas
Process Big Data and Enterprise Data
Single Source of Truth (SSOT)
SLA's of data platform
Real-time
servers

Real-time
Customers

Offline
Customers
Data Bus
SLA:
1. 98% in < 1/2 hr
2. 99.999% < 4 hrs

BI DWH

SLA:
1. 98% in < 500 msec
2. No send > 2 sec

Real-time
dashboards
Legacy Data flow in LivePerson
RealTime
servers

ETL
Sessionize
Modeling
Schema
View

Customers
View
Reports

BI DWH
(Oracle)
1st phase - move to Hadoop
RealTime
servers

ETL
Sessionize
Modeling
Schema
View

Customers
View
Reports

Hadoop

MR Job transfers
data to BI DWH

HDFS

BI DWH
(Vertica)
2. move to Kafka
RealTime
servers

Kafka
Topic-1

Customers
View
Reports

Hadoop
6

MR Job transfers
data to BI DWH

HDFS

BI DWH
(Vertica)
3. Integrate with new producers
New
RealTime
servers

RealTime
servers

Kafka
Topic-1

Topic-2

Customers
View
Reports

Hadoop
6

MR Job transfers
data to BI DWH

HDFS

BI DWH
(Vertica)
4. Add Real-time BI
New
RealTime
servers

RealTime
servers

Kafka
Topic-1

Topic-2

Storm
Topology

Customers
View
Reports

Hadoop
6

MR Job transfers
data to BI DWH

HDFS

BI DWH
(Vertica)
5. Standardize Data-Model using Avro
New
RealTime
servers

RealTime
servers

Kafka
Topic-1

Topic-2

Storm
Topology

Hadoop

Customers
View
Reports

Camus

6

MR Job transfers
data to BI DWH

HDFS

BI DWH
(Vertica)
6. Define Single Source of Truth (SSOT)
New
RealTime
servers

RealTime
servers

Kafka
Topic-1

Topic-2

Storm
Topology

Hadoop

Customers
View
Reports

Camus

6

MR Job transfers
data to BI DWH

HDFS

BI DWH
(Vertica)
Kafka[2] as Backbone for Data

•
•
•
•
•
•
•
•

Central "Message Bus"
Support multiple topics (MQ style)
Write ahead to files
Distributed & Highly Available
Horizontal Scale
High throughput (10s MB/Sec per server)
Service is agnostic to consumers' state
Retention policy
Kafka Architecture
Kafka Architecture cont.
Producer 1

Producer 2

Producer 3

Zookeeper

Node 1

Consumer 1

Node 2

Consumer 1

Node 3

Consumer 1
Kafka Architecture cont.
Producer 1

Topic1

Producer 2

Topic2

Zookeeper

Node 1

Node 2

Node 3

Node 4

Group1
Consumer 1

Consumer 2

Consumer 3
Kafka replay messages.
Zookeeper

Min Offset ->
Max Offset ->

Node 3

Node 4

fetchRequest = new fetchRequest(topic, partition, offset, size);
currentOffset : taken from zookeeper
Earliest offset: -2
Latest offset : -1
Kafka API[3]

•
•

Producer API
Consumer API
o

High-level API


using zookeeper to access brokers and to save
offsets

o

SimpleConsumer API


•

direct access to Kafka brokers

Kafka-Spout, Camus, and KafkaHadoopConsumer all

use SimpleConsumer
Kafka API[3]

•

Producer

messages = new List<KeyedMessage<K, V>>()
Messages.add(new KeyedMessage(“topic1”, null, msg1));
Send(messages);

•

Consumer

streams[] = Consumer.createMessageStream((“topic1”, 1);
for (message: streams[0]{
//do something with message

}
Kafka in Unit Testing
•
•

Use of class KafkaServer
Run embedded server
Introducing Avro[5]

•
•

Schema representation using JSON
Support types
o

Primitive types: boolean, int, long, string, etc.

o

Complex types:
Record, Enum, Union, Arrays, Maps, Fixed

•
•

Data is serialized using its schema
Avro files include file-header of the schema
Add Avro protocol to the story
Kafka message
1.0

Producer 1

Create Message
according to
Schema 1.0

Header

{event1:{header:{sessionId:"102122"),{timestamp:"12346")}...

Avro message

Register schema 1.0
Add revision to
message header

Send message

Encode message
with Schema 1.0

Schema Repo

Topic 1

Read message

Topic 2

Extract header
and obtain
schema version

Consumers:
Camus/Storm

Get schema by
version 1.0

Decode message
with schema 1.0
Pass message
Kafka + Storm + Avro example

•

Demonstrating use of Avro data passing from Kafka to
Storm

•
•
•
•
•

Explains Avro revision evolution
Requires Kafka and Zookeeper installed
Uses Storm artifact and Kafka-Spout artifact in Maven

Plugin generates Java classes from Avro Schema
https://github.com/ransilberman/avro-kafka-storm
Resiliency
Producer machine
Producer
Send message
to Kafka

Persist message to
local disk

local file
Kafka Bridge
Send message
to Kafka

Real-time
Consumer:
Storm

Fast
Topic

Consistent
Topic

Offline
Consumer:
Hadoop
Challenges of Kafka

•
•
•
•
•
•
•

Still not mature enough
Not enough supporting tools (viewers, maintenance)
Duplications may occur
API not documented enough
Open Source - support by community only
Difficult to replay messages from specific point in time
Eventually Consistent...
Eventually Consistent
Because it is a distributed system -

•
•
•
•

No guarantee for delivery order
No way to tell to which broker message is sent
Kafka do not guarantee that there are no duplications
...But eventually, all message will arrive!
Desert
Event
generated

Event
destination
Major Improvements in Kafka 0.8[4]

•
•
•

Partitions replication
Message send guarantee
Consumer offsets are represented numbers instead of
bytes (e.g., 1, 2, 3, ..)
Addressing Data Challenges

•

High throughput
o Kafka, Hadoop

•

Horizontal scale to address growth
o Kafka, Storm, Hadoop

•

High availability of data services
o Kafka, Storm, Zookeeper

•

No Data loss
o Highly Available services, No ETL
Addressing Data Challenges Cont.

•

Satisfy Real-Time demands
o Storm

•

Enforce structural data with schemas
o Avro

•

Process Big Data and Enterprise Data
o Kafka, Hadoop

•

Single Source of Truth (SSOT)
o Hadoop, No ETL
References

•

[1]Satisfying new requirements for Data Integration By

David Loshin

•
•
•
•
•

[2]Apache Kafka
[3]Kafka API
[4]Kafka 0.8 Quick Start

[5]Apache Avro
[5]Storm
Thank you!

More Related Content

What's hot

Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
Deep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionDeep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionAlexandre Tamborrino
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streamsjimriecken
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...Lucas Jellema
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Erik Onnen
 
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...confluent
 
Let the alpakka pull your stream
Let the alpakka pull your streamLet the alpakka pull your stream
Let the alpakka pull your streamEnno Runne
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...confluent
 
Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...
Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...
Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...confluent
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streamsconfluent
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scalejimriecken
 
Introducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache KafkaIntroducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache KafkaApurva Mehta
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Guozhang Wang
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Productionconfluent
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyAllen (Xiaozhong) Wang
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectKaufman Ng
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Timothy Spann
 

What's hot (20)

Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Deep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionDeep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumption
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
 
Let the alpakka pull your stream
Let the alpakka pull your streamLet the alpakka pull your stream
Let the alpakka pull your stream
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...
Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...
Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
 
Introducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache KafkaIntroducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache Kafka
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka Journey
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 

Viewers also liked

Breaking Bad Backlinks
Breaking Bad BacklinksBreaking Bad Backlinks
Breaking Bad Backlinksmarketgoo
 
Conole estonia workshop
Conole estonia workshopConole estonia workshop
Conole estonia workshopgrainne
 
Humans, Robots and The Digital Panopticon
Humans, Robots and The Digital PanopticonHumans, Robots and The Digital Panopticon
Humans, Robots and The Digital PanopticonBenjamin Joffe
 
1 curso de iniciación
1 curso de iniciación1 curso de iniciación
1 curso de iniciaciónpezrayo
 
Tema 2. estilos de liderazgo
Tema 2.  estilos de liderazgoTema 2.  estilos de liderazgo
Tema 2. estilos de liderazgoNorlan Joiner
 
Formulario JMJ11 lectionautas
Formulario JMJ11  lectionautasFormulario JMJ11  lectionautas
Formulario JMJ11 lectionautasMargarita M'Urena
 
El futuro de los smartphones
El futuro de los smartphonesEl futuro de los smartphones
El futuro de los smartphonesGoranGonso
 
Monografía hidrología - temas clase 1
Monografía   hidrología - temas clase 1Monografía   hidrología - temas clase 1
Monografía hidrología - temas clase 1Braidy Araya Melendez
 
Ensamble y desensamble
Ensamble y desensambleEnsamble y desensamble
Ensamble y desensambledracmax
 
Parallel imports in the public health sector
Parallel imports in the public health sectorParallel imports in the public health sector
Parallel imports in the public health sectorAnkush Chattopadhyay
 
Modificaciones De Ansi Sql
Modificaciones De Ansi SqlModificaciones De Ansi Sql
Modificaciones De Ansi Sqlguest0c9485
 
Wor(l)d Global Network - Presentazione (ita)
Wor(l)d Global Network - Presentazione (ita)Wor(l)d Global Network - Presentazione (ita)
Wor(l)d Global Network - Presentazione (ita)Wor(l)d Global Network
 
Perspektive Praktikum
Perspektive PraktikumPerspektive Praktikum
Perspektive Praktikumyellowcow
 

Viewers also liked (20)

1o. mat de apoyo nov dic-2015-2016
1o. mat de apoyo nov dic-2015-20161o. mat de apoyo nov dic-2015-2016
1o. mat de apoyo nov dic-2015-2016
 
Breaking Bad Backlinks
Breaking Bad BacklinksBreaking Bad Backlinks
Breaking Bad Backlinks
 
Manual compra armas
Manual compra armasManual compra armas
Manual compra armas
 
Conole estonia workshop
Conole estonia workshopConole estonia workshop
Conole estonia workshop
 
Humans, Robots and The Digital Panopticon
Humans, Robots and The Digital PanopticonHumans, Robots and The Digital Panopticon
Humans, Robots and The Digital Panopticon
 
1 curso de iniciación
1 curso de iniciación1 curso de iniciación
1 curso de iniciación
 
Tema 2. estilos de liderazgo
Tema 2.  estilos de liderazgoTema 2.  estilos de liderazgo
Tema 2. estilos de liderazgo
 
Communications
CommunicationsCommunications
Communications
 
Formulario JMJ11 lectionautas
Formulario JMJ11  lectionautasFormulario JMJ11  lectionautas
Formulario JMJ11 lectionautas
 
El futuro de los smartphones
El futuro de los smartphonesEl futuro de los smartphones
El futuro de los smartphones
 
Monografía hidrología - temas clase 1
Monografía   hidrología - temas clase 1Monografía   hidrología - temas clase 1
Monografía hidrología - temas clase 1
 
HPS Worldwide
HPS WorldwideHPS Worldwide
HPS Worldwide
 
Ensamble y desensamble
Ensamble y desensambleEnsamble y desensamble
Ensamble y desensamble
 
Parallel imports in the public health sector
Parallel imports in the public health sectorParallel imports in the public health sector
Parallel imports in the public health sector
 
Klarna checkout
Klarna checkoutKlarna checkout
Klarna checkout
 
Modificaciones De Ansi Sql
Modificaciones De Ansi SqlModificaciones De Ansi Sql
Modificaciones De Ansi Sql
 
Wor(l)d Global Network - Presentazione (ita)
Wor(l)d Global Network - Presentazione (ita)Wor(l)d Global Network - Presentazione (ita)
Wor(l)d Global Network - Presentazione (ita)
 
Email Collections
Email CollectionsEmail Collections
Email Collections
 
Presentacion Yokohama
Presentacion YokohamaPresentacion Yokohama
Presentacion Yokohama
 
Perspektive Praktikum
Perspektive PraktikumPerspektive Praktikum
Perspektive Praktikum
 

Similar to Kafka Data Platform Evolution

From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandRan Silberman
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...LINE Corporation
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Peter Bakas
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaLevon Avakyan
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Evan Chan
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings MeetupGwen (Chen) Shapira
 
JHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemJHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemFlorent Ramiere
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafkaconfluent
 
F_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptxF_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptxNIMITJAIN71
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackSadayuki Furuhashi
 
NoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePackNoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePackSadayuki Furuhashi
 
I can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringI can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringJoe Kutner
 

Similar to Kafka Data Platform Evolution (20)

From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised Land
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using Kafka
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings Meetup
 
JHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemJHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka Ecosystem
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
F_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptxF_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptx
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDB
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePack
 
NoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePackNoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePack
 
I can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringI can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and Spring
 

More from LivePerson

Microservices on top of kafka
Microservices on top of kafkaMicroservices on top of kafka
Microservices on top of kafkaLivePerson
 
Graph QL Introduction
Graph QL IntroductionGraph QL Introduction
Graph QL IntroductionLivePerson
 
Kubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformKubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformLivePerson
 
Growing into a proactive Data Platform
Growing into a proactive Data PlatformGrowing into a proactive Data Platform
Growing into a proactive Data PlatformLivePerson
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die() LivePerson
 
Resilience from Theory to Practice
Resilience from Theory to PracticeResilience from Theory to Practice
Resilience from Theory to PracticeLivePerson
 
System Revolution- How We Did It
System Revolution- How We Did It System Revolution- How We Did It
System Revolution- How We Did It LivePerson
 
Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015 LivePerson
 
Http 2: Should I care?
Http 2: Should I care?Http 2: Should I care?
Http 2: Should I care?LivePerson
 
Mobile app real-time content modifications using websockets
Mobile app real-time content modifications using websocketsMobile app real-time content modifications using websockets
Mobile app real-time content modifications using websocketsLivePerson
 
Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices LivePerson
 
Functional programming with Java 8
Functional programming with Java 8Functional programming with Java 8
Functional programming with Java 8LivePerson
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]LivePerson
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonLivePerson
 
Data compression in Modern Application
Data compression in Modern ApplicationData compression in Modern Application
Data compression in Modern ApplicationLivePerson
 
Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API LivePerson
 
SIP - Introduction to SIP Protocol
SIP - Introduction to SIP ProtocolSIP - Introduction to SIP Protocol
SIP - Introduction to SIP ProtocolLivePerson
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceLivePerson
 
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...LivePerson
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceLivePerson
 

More from LivePerson (20)

Microservices on top of kafka
Microservices on top of kafkaMicroservices on top of kafka
Microservices on top of kafka
 
Graph QL Introduction
Graph QL IntroductionGraph QL Introduction
Graph QL Introduction
 
Kubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformKubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platform
 
Growing into a proactive Data Platform
Growing into a proactive Data PlatformGrowing into a proactive Data Platform
Growing into a proactive Data Platform
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die()
 
Resilience from Theory to Practice
Resilience from Theory to PracticeResilience from Theory to Practice
Resilience from Theory to Practice
 
System Revolution- How We Did It
System Revolution- How We Did It System Revolution- How We Did It
System Revolution- How We Did It
 
Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015
 
Http 2: Should I care?
Http 2: Should I care?Http 2: Should I care?
Http 2: Should I care?
 
Mobile app real-time content modifications using websockets
Mobile app real-time content modifications using websocketsMobile app real-time content modifications using websockets
Mobile app real-time content modifications using websockets
 
Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices
 
Functional programming with Java 8
Functional programming with Java 8Functional programming with Java 8
Functional programming with Java 8
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePerson
 
Data compression in Modern Application
Data compression in Modern ApplicationData compression in Modern Application
Data compression in Modern Application
 
Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API
 
SIP - Introduction to SIP Protocol
SIP - Introduction to SIP ProtocolSIP - Introduction to SIP Protocol
SIP - Introduction to SIP Protocol
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduce
 
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 

Kafka Data Platform Evolution

  • 1. From a Kafkaesque Story to the Promised Land 7/7/2013 Ran Silberman
  • 2. Open Source paradigm The Cathedral & the Bazaar by Eric S Raymond, 1999 the struggle between top-down and bottom-up design
  • 3. Challenges of data platform[1] • • • • • • • • High throughput Horizontal scale to address growth High availability of data services No Data loss Satisfy Real-Time demands Enforce structural data with schemas Process Big Data and Enterprise Data Single Source of Truth (SSOT)
  • 4. SLA's of data platform Real-time servers Real-time Customers Offline Customers Data Bus SLA: 1. 98% in < 1/2 hr 2. 99.999% < 4 hrs BI DWH SLA: 1. 98% in < 500 msec 2. No send > 2 sec Real-time dashboards
  • 5. Legacy Data flow in LivePerson RealTime servers ETL Sessionize Modeling Schema View Customers View Reports BI DWH (Oracle)
  • 6. 1st phase - move to Hadoop RealTime servers ETL Sessionize Modeling Schema View Customers View Reports Hadoop MR Job transfers data to BI DWH HDFS BI DWH (Vertica)
  • 7. 2. move to Kafka RealTime servers Kafka Topic-1 Customers View Reports Hadoop 6 MR Job transfers data to BI DWH HDFS BI DWH (Vertica)
  • 8. 3. Integrate with new producers New RealTime servers RealTime servers Kafka Topic-1 Topic-2 Customers View Reports Hadoop 6 MR Job transfers data to BI DWH HDFS BI DWH (Vertica)
  • 9. 4. Add Real-time BI New RealTime servers RealTime servers Kafka Topic-1 Topic-2 Storm Topology Customers View Reports Hadoop 6 MR Job transfers data to BI DWH HDFS BI DWH (Vertica)
  • 10. 5. Standardize Data-Model using Avro New RealTime servers RealTime servers Kafka Topic-1 Topic-2 Storm Topology Hadoop Customers View Reports Camus 6 MR Job transfers data to BI DWH HDFS BI DWH (Vertica)
  • 11. 6. Define Single Source of Truth (SSOT) New RealTime servers RealTime servers Kafka Topic-1 Topic-2 Storm Topology Hadoop Customers View Reports Camus 6 MR Job transfers data to BI DWH HDFS BI DWH (Vertica)
  • 12. Kafka[2] as Backbone for Data • • • • • • • • Central "Message Bus" Support multiple topics (MQ style) Write ahead to files Distributed & Highly Available Horizontal Scale High throughput (10s MB/Sec per server) Service is agnostic to consumers' state Retention policy
  • 14. Kafka Architecture cont. Producer 1 Producer 2 Producer 3 Zookeeper Node 1 Consumer 1 Node 2 Consumer 1 Node 3 Consumer 1
  • 15. Kafka Architecture cont. Producer 1 Topic1 Producer 2 Topic2 Zookeeper Node 1 Node 2 Node 3 Node 4 Group1 Consumer 1 Consumer 2 Consumer 3
  • 16. Kafka replay messages. Zookeeper Min Offset -> Max Offset -> Node 3 Node 4 fetchRequest = new fetchRequest(topic, partition, offset, size); currentOffset : taken from zookeeper Earliest offset: -2 Latest offset : -1
  • 17. Kafka API[3] • • Producer API Consumer API o High-level API  using zookeeper to access brokers and to save offsets o SimpleConsumer API  • direct access to Kafka brokers Kafka-Spout, Camus, and KafkaHadoopConsumer all use SimpleConsumer
  • 18. Kafka API[3] • Producer messages = new List<KeyedMessage<K, V>>() Messages.add(new KeyedMessage(“topic1”, null, msg1)); Send(messages); • Consumer streams[] = Consumer.createMessageStream((“topic1”, 1); for (message: streams[0]{ //do something with message }
  • 19. Kafka in Unit Testing • • Use of class KafkaServer Run embedded server
  • 20. Introducing Avro[5] • • Schema representation using JSON Support types o Primitive types: boolean, int, long, string, etc. o Complex types: Record, Enum, Union, Arrays, Maps, Fixed • • Data is serialized using its schema Avro files include file-header of the schema
  • 21. Add Avro protocol to the story Kafka message 1.0 Producer 1 Create Message according to Schema 1.0 Header {event1:{header:{sessionId:"102122"),{timestamp:"12346")}... Avro message Register schema 1.0 Add revision to message header Send message Encode message with Schema 1.0 Schema Repo Topic 1 Read message Topic 2 Extract header and obtain schema version Consumers: Camus/Storm Get schema by version 1.0 Decode message with schema 1.0 Pass message
  • 22. Kafka + Storm + Avro example • Demonstrating use of Avro data passing from Kafka to Storm • • • • • Explains Avro revision evolution Requires Kafka and Zookeeper installed Uses Storm artifact and Kafka-Spout artifact in Maven Plugin generates Java classes from Avro Schema https://github.com/ransilberman/avro-kafka-storm
  • 23. Resiliency Producer machine Producer Send message to Kafka Persist message to local disk local file Kafka Bridge Send message to Kafka Real-time Consumer: Storm Fast Topic Consistent Topic Offline Consumer: Hadoop
  • 24. Challenges of Kafka • • • • • • • Still not mature enough Not enough supporting tools (viewers, maintenance) Duplications may occur API not documented enough Open Source - support by community only Difficult to replay messages from specific point in time Eventually Consistent...
  • 25. Eventually Consistent Because it is a distributed system - • • • • No guarantee for delivery order No way to tell to which broker message is sent Kafka do not guarantee that there are no duplications ...But eventually, all message will arrive! Desert Event generated Event destination
  • 26. Major Improvements in Kafka 0.8[4] • • • Partitions replication Message send guarantee Consumer offsets are represented numbers instead of bytes (e.g., 1, 2, 3, ..)
  • 27. Addressing Data Challenges • High throughput o Kafka, Hadoop • Horizontal scale to address growth o Kafka, Storm, Hadoop • High availability of data services o Kafka, Storm, Zookeeper • No Data loss o Highly Available services, No ETL
  • 28. Addressing Data Challenges Cont. • Satisfy Real-Time demands o Storm • Enforce structural data with schemas o Avro • Process Big Data and Enterprise Data o Kafka, Hadoop • Single Source of Truth (SSOT) o Hadoop, No ETL
  • 29. References • [1]Satisfying new requirements for Data Integration By David Loshin • • • • • [2]Apache Kafka [3]Kafka API [4]Kafka 0.8 Quick Start [5]Apache Avro [5]Storm