SlideShare a Scribd company logo
Kafka and Storm – event
processing in realtime
Guido Schmutz
WELCOME

Kafka and Storm – event
processing in realtime
Guido Schmutz

Jazoon 2013
23.10.2013

BASEL

2

BERN

LAUSANNE

ZÜRICH

DÜSSELDORF

FRANKFURT A.M.

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

FREIBURG I.BR.

HAMBURG

MÜNCHEN

STUTTGART

WIEN

Guido Schmutz
• 
• 

Working for Trivadis for more than 16 years
Oracle ACE Director for Fusion Middleware and SOA

• 
• 

Co-Author of different books
Consultant, Trainer Software Architect for Java, Oracle, SOA,
EDA, BigData und FastData

• 
• 

Member of Trivadis Architecture Board
Technology Manager @ Trivadis

• 

More than 20 years of software development 

experience

• 

Contact: guido.schmutz@trivadis.com

• 
• 

Blog: http://guidoschmutz.wordpress.com
Twitter: gschmutz

3

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Our company
Trivadis is a market leader in IT consulting, system integration,
solution engineering and the provision of IT services focusing
on
and
technologies in Switzerland,
Germany and Austria.
We offer our services in the following strategic business fields:

OPERATION

Trivadis Services takes over the interacting operation of your IT systems.
4

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
With over 600 specialists and IT experts in your region

Hamburg

Düsseldorf

Frankfurt

Stuttgart

Freiburg

Wien
München

Basel Brugg
Bern
Zurich
Lausanne

5
5

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

12 Trivadis branches and more than
600 employees
 
200 Service Level Agreements
 
Over 4,000 training participants
 
Research and development budget:
CHF 5.0 / EUR 4 million
 
Financially self-supporting and
sustainably profitable
 
Experience from more than 1,900
projects per year at over 800
customers
Agenda
1.  Introduction
2.  What is Apache Kafka?
3.  What is Twitter Storm?
4.  Summary

6

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Big Data Definition (Gartner et al)

Characteristics of Big Data: Its
Volume, Velocity and Variety in
combination

Tera-, Peta-, Exa-, Zetta-, Yota- bytes and constantly growing

Velocity
“Traditional” computing in RDBMS 

is not scalable enough. 

We search for “linear scalability”

“Only … structured information 

is not enough” – “95% of produced data in
unstructured”

+ Veracity (IBM) - information uncertainty
+ Time to action ? – Big Data + Event Processing = Fast Data
7

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
8

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Sample – Show the counts of tweets related to Jazoon (using
term jazoon) 

23.10.2013 – 15:45 not
showing #jazoon
http://54.217.159.208:8484/jazoon-restapi/

9

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Velocity
§  Velocity requirement examples:
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 

10

Recommendation Engine
Predictive Analytics
Marketing Campaign Analysis
Customer Retention and Churn Analysis
Social Graph Analysis
Capital Markets Analysis
Risk Management
Rogue Trading
Fraud Detection
Retail Banking
Network Monitoring
Research and Development

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Agenda
1.  Introduction
2.  What is Apache Kafka?
3.  What is Twitter Storm?
4.  Summary

11

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Apache Kafka
•  A distributed publish-subscribe messaging system
•  Designed for processing of real time activity stream data (logs, metrics
collections, social media streams, …)
•  Initially developed at LinkedIn, now part of Apache
•  Does not follow JMS Standards and does not use JMS API
•  Kafka maintains feeds of messages in topics

12

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Sample (#1)
1) Use the Twitter Streaming API (Twitter Horsebird Client) to get all the
tweets containing the term jazoon
Twitter
Stream

13

tweet

Twitter to
Kafka

tweet

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Kafka
Topic

Create a Filter Stream for
tracking a list of terms
Twitter
Stream

Sample (#1)

tweet

Twitter to
Kafka

tweet

Kafka
Topic

Create a fully configured
Horsebird Client instance

Start the client in
multiple threads

14

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Twitter
Stream

Sample (#1)

tweet

Twitter to
Kafka

tweet

Kafka
Topic

Convert Twitter status
to Avro format

Create the Kafka producer

Send the message
to the Kafka topic
15

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Agenda
1.  Introduction
2.  What is Apache Kafka?
3.  What is Twitter Storm?
4.  Summary

16

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
What is Twitter Storm ?
A platform for doing analysis on streams of data as they come in, so you
can react to data as it happens.
•  A highly distributed real-time computation system
•  Provides general primitives to do real-time computation
•  To simplify working with queues & workers
•  scalable and fault-tolerant
•  complementary to Hadoop
Originated at Backtype, acquired by Twitter in 2011
Open Sourced late 2011
Part of Apache Incubator since September 2013
17

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Hadoop vs. Storm
Hadoop

Storm

Batch processing
Jobs runs to completion
Stateful nodes
Scalable
Guarantees no data loss
Open Source
=> big batch processing
18

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

!=
=

Real-time processing
Topologies run forever
Stateless nodes
Scalable
Guarantees no data loss
Open Source

=> Fast, reactive, real time procesing
Idea: Simplify dealing with queue & workers
Scaling is painful – queue partitioning & worker deploy
Operational overhead – worker failures & queue backups
No guarantees on data processing

19

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
What does Storm provide?
§  At least once message processing
§  Fault-tolerant
§  Horizontal scalability
§  No intermediate queues
§  Less operational overhead
§  „Just works“
§  Broad set of use cases
§  Stream processing
§  Continous computation
§  Distributed RPC

20

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Main Concepts - Streams
Stream
•  an unbounded sequence of tuples
•  core abstraction in Storm
•  Defined with a schema that names the 

fields in the tuple
•  Value must be serializable
•  Every stream has an id

T

T

T

T

T

T

Subscribes: C & D
Emits: Source of
stream A

Source of
stream B

21

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

T

Subscribes: A
Emits: C

Topology
•  graph where each node is a spout or bolt
•  edges indicating which bolt subscribes 

to which streams

T

Subscribes: A
Emits: D

Subscribes: A & B
Emits: -
Main Concepts – Worker, Executor, Task
Worker
•  executes a subset of a topology, may run one or more executors for one or
more components

Executor
•  thread spawned by worker

Task
•  performs the actual data processing

Worker Process
Task

Task

22

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Task

Task
Main Concepts - Spout
A spout is the component which acts as the source of streams in a
topology
Generally spouts read tuples from an external source and emit them into
the topology
Spouts can emit more than one stream

23

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Main Concepts - Bolt
Bolt is doing the processing of the message
•  filtering, functions, aggregations, joins, talking to databases….
•  Can do simple stream transformations
•  Complex transformations often require multiple steps and therefore
multiple bolts
Bolts can emit one or more streams

24

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Sample (#2)
2) Use a Spout to subscribe to the Kafka Topic to get the related tweets
Twitter
Stream
tweet

Twitter to
Kafka
tweet

Kafka
Topic

25

tweet

Kafka

Spout

tweet

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Kafka
Topic

Sample (#2)

Tweets

Kafka

Spout

tweet

Use existing implementation of the Kafka Spout from storm-kafka project

Create a Kafka configuration

26
2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Create Kafka spout to connect to the
topic
Sample (#3)
3) Use a bolt to retrieve the hashtags for each tweet
Twitter
Stream
tweet

Twitter to
Kafka
tweet

Kafka
Topic

27

tweet

Kafka

Spout

tweet

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Hashtag

Splitter

hashtag
Kafka
Topic

Sample (#3)

Tweets

Kafka

Spout

tweet

Hashtag

Splitter

hashtag

Execute is called for each
tuple arriving on the input
stream

Declares the output fields for the component
and is called when bolt is created

28

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Sample (#4)
4) Use a bolt to count the occurrences of hashtags into a Redis NoSQL DB
Twitter
Stream
Tweets

Twitter to
Kafka
Tweets

Kafka
Topic

29

Tweets

Kafka

Spout

tweet

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Hashtag

Splitter

hashtag

Hashtag

Counter
Sample (#4)

Kafka
Topic

Tweets

Kafka

Spout

tweet

Hashtag

Splitter

hashtag

Hashtag

Counter

Execute is called for each tuple
arriving on the input stream

Prepare is called when bolt
is created

30

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Main Concepts - Topology

Kafka
Topic

Tweets

Kafka

Spout

Hashtag

Splitter
Shuffle

Hashtag

Counter
Fields

Hashtag

Splitter

Hashtag

Counter

Each Spout or Bolt are running N instances in parallel
Shuffle grouping

is random grouping

Fields grouping

is grouped by value, such that equal value results in equal task

All grouping

replicates to all tasks

Global grouping

makes all tuples go to one task

None grouping

makes bolt run in the same thread as bold/spout it subscribes to

Direct grouping

producer (task that emits) controls which consumer will receive

31

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Sample – How does it work ?

My presentation
„Kafka and Storm –
event processing in
realtime“ tomorrow
12:00 at Jazoon
Zurich #jazoon
#storm #kafka CU
there!

Kafka
Topic

32

Kafka

Spout

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Hashtag

Splitter

Hashtag

Counter

Hashtag

Splitter

Hashtag

Counter
Sample – How does it work ?

My presentation
„Kafka and Storm –
event processing in
realtime“ tomorrow
12:00 at Jazoon
Zurich #jazoon
#storm #kafka CU
there!

Kafka
Topic

33

Kafka

Spout

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Hashtag

Splitter
Hashtag

Splitter

jazoon
storm
kafka

Hashtag

Counter
Hashtag

Counter
Sample – How does it work ?

My presentation
„Kafka and Storm –
event processing in
realtime“ tomorrow
12:00 at Jazoon
Zurich #jazoon
#storm #kafka CU
there!

Kafka
Topic

34

Kafka

Spout

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Hashtag

Splitter
Hashtag

Splitter

jazoon
storm
kafka

Hashtag

Counter
Hashtag

Counter

INCR
jazoon

INCR
storm
INCR
kafka

jazoon = 1
storm = 1
kafka = 1
Sample (#5)
5) Setup the topology with the necessary groupings (distribution)
Create Stream called „tweet-stream“
Register Kafka
spout and run by 1
worker
Subscribe to stream „tweetstream“ using shuffle grouping

Run this bolt
by 2 workers
Subscribe to „hashtag-splitter“ using
fields grouping on field „hashtag“
35

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Trident
•  High-Level abstraction on top of storm
•  Makes it easier to build topologies
•  Core data model is the stream
•  Processed as a series of batches
•  Stream is partitioned among nodes in cluster

•  5 kinds of operations in Trident
•  Operations that apply locally to each partition and cause no network
transfer
•  Repartitioning operations that don‘t change the contents
•  Aggregation operations that do network transfer
•  Operations on grouped streams
•  Merges and Joins

36

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Trident
Supports
• 
• 
• 
• 
• 

Joins
Aggregations
Grouping
Functions
Filters

Similar to Hadoop and Pig/Cascading

37

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Trident Concepts - Topology

Kafka
Topic

38

tweet

Kafka

Spout

tweet

Bolt
Hashtag

Splitter

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

hashtag
local

Hashtag

Normalizer

hashtag
groupBy

Bolt
Persistent

Aggregate
Trident Concepts - Function
•  takes in a set of input fields and emits zero or more tuples as output
•  fields of the output tuple are appended to the original input tuple in the
stream
•  If a function emits no tuples, the original input tuple is filtered out
•  Otherwise the input tuple is duplicated for each output tuple

39

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Agenda
1.  Introduction
2.  What is Apache Kafka?
3.  What is Twitter Storm?
4.  Summary

40

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Summary
Kafka

Storm

•  Distributed Scalable Pub/Sub system
for Big Data

•  Express real-time processing naturally

•  Producer => Broker => Consumer of
message topics
•  Persists messages with ability to
rewind
•  Consumer decides what he as
consumed so far

•  Not a Hadoop/MapReduce competitor
•  Supports other languages
•  Hard to debug
•  Object Serialization
•  Didn‘t cover
§  Reliability through guaranteed message
processing
§  Distributed RPC
§  Storm cluster setup and deploy

http://54.217.159.208:8484/jazoon-restapi/
41

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Lambda Architecture

Big Data and Fast Data combined

Precompute

Precomputed
Views
information

All data

Batch
recompute

Incoming
Data

Serving Layer
batch view
batch view
Merge

Batch Layer

Speed Layer
Process stream

Incremented
information

Realtime
increment

query

real time view
real time view

Source: Marz, N. & Warren, J. (2013) Big Data. Manning.

42

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Lambda Architecture

Big Data and Fast Data combined
one possible product/framework mapping

Precompute

Precomputed
Views
information

All data

Batch
recompute

Incoming
Data

batch view
batch view

Speed Layer
Process stream

Incremented
information

Realtime
increment

43

Serving Layer

Merge

Batch Layer

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

real time view
real time view

query
Resources
Sample Code: https://github.com/gschmutz/jazoon-storm-twitter-sample
Twitter Streaming API: https://dev.twitter.com/docs/streaming-apis
Twitter Horsebird Client (HBC): https://github.com/twitter/hbc
Apache Kafka: http://kafka.apache.org/
Storm Website: http://storm-project.net/
Storm Wiki: https://github.com/nathanmarz/storm/wiki
Storm Doc: https://github.com/nathanmarz/storm/wiki/Documentation

44

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013
Thank You!

BASEL

45

BERN

LAUSANNE

ZÜRICH

DÜSSELDORF

Trivadis AG
Guido Schmutz

guido.schmutz@trivadis.com

FRANKFURT A.M.

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

FREIBURG I.BR.

HAMBURG

MÜNCHEN

STUTTGART

WIEN

Lambda Architecture

Big Data and Fast Data combined
Batch Layer

Serving Layer

Immutable
data

Batch
View

B
Incoming
Data

C

D

A

G

Speed Layer
Data
Stream
E

46

2013 © Trivadis
Kafka and Storm – event processing in realtime
22.10.2013

Realtime
View
F

Query

More Related Content

What's hot

Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
Timothy Spann
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
Julien Le Dem
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
Davin Abraham
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
Unified Log Processing Architecture
Unified Log Processing ArchitectureUnified Log Processing Architecture
Unified Log Processing Architecture
Guido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai Wähner
 
Distributed Tracing in Practice
Distributed Tracing in PracticeDistributed Tracing in Practice
Distributed Tracing in Practice
DevOps.com
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Kafka basics
Kafka basicsKafka basics
When apache pulsar meets apache flink
When apache pulsar meets apache flinkWhen apache pulsar meets apache flink
When apache pulsar meets apache flink
StreamNative
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
Gwen (Chen) Shapira
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
 

What's hot (20)

Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Unified Log Processing Architecture
Unified Log Processing ArchitectureUnified Log Processing Architecture
Unified Log Processing Architecture
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
 
Distributed Tracing in Practice
Distributed Tracing in PracticeDistributed Tracing in Practice
Distributed Tracing in Practice
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
 
When apache pulsar meets apache flink
When apache pulsar meets apache flinkWhen apache pulsar meets apache flink
When apache pulsar meets apache flink
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 

Similar to Kafka and Storm - event processing in realtime

Twitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in EchtzeitTwitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in Echtzeit
Guido Schmutz
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
Timothy Spann
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline Development
Timothy Spann
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
ssuser73434e
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
Timothy Spann
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Guido Schmutz
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoT
Jim Haughwout
 
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloAzure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
ITCamp
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...
INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...
INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...
apidays
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
Florent Ramiere
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructure
mattlieber
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Guido Schmutz
 

Similar to Kafka and Storm - event processing in realtime (20)

Twitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in EchtzeitTwitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in Echtzeit
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline Development
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoT
 
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloAzure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...
INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...
INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your B...
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructure
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 

More from Guido Schmutz

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
Guido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Guido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
Guido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Location Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaLocation Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache Kafka
Guido Schmutz
 

More from Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Location Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaLocation Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache Kafka
 

Recently uploaded

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Kafka and Storm - event processing in realtime

  • 1. Kafka and Storm – event processing in realtime Guido Schmutz
  • 2. WELCOME Kafka and Storm – event processing in realtime Guido Schmutz
 Jazoon 2013 23.10.2013 BASEL 2 BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

  • 3. Guido Schmutz •  •  Working for Trivadis for more than 16 years Oracle ACE Director for Fusion Middleware and SOA •  •  Co-Author of different books Consultant, Trainer Software Architect for Java, Oracle, SOA, EDA, BigData und FastData •  •  Member of Trivadis Architecture Board Technology Manager @ Trivadis •  More than 20 years of software development 
 experience •  Contact: guido.schmutz@trivadis.com •  •  Blog: http://guidoschmutz.wordpress.com Twitter: gschmutz 3 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 4. Our company Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany and Austria. We offer our services in the following strategic business fields: OPERATION Trivadis Services takes over the interacting operation of your IT systems. 4 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 5. With over 600 specialists and IT experts in your region Hamburg Düsseldorf Frankfurt Stuttgart Freiburg Wien München Basel Brugg Bern Zurich Lausanne 5 5 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 12 Trivadis branches and more than 600 employees   200 Service Level Agreements   Over 4,000 training participants   Research and development budget: CHF 5.0 / EUR 4 million   Financially self-supporting and sustainably profitable   Experience from more than 1,900 projects per year at over 800 customers
  • 6. Agenda 1.  Introduction 2.  What is Apache Kafka? 3.  What is Twitter Storm? 4.  Summary 6 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 7. Big Data Definition (Gartner et al) Characteristics of Big Data: Its Volume, Velocity and Variety in combination Tera-, Peta-, Exa-, Zetta-, Yota- bytes and constantly growing Velocity “Traditional” computing in RDBMS 
 is not scalable enough. 
 We search for “linear scalability” “Only … structured information 
 is not enough” – “95% of produced data in unstructured” + Veracity (IBM) - information uncertainty + Time to action ? – Big Data + Event Processing = Fast Data 7 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 8. 8 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 9. Sample – Show the counts of tweets related to Jazoon (using term jazoon) 
 23.10.2013 – 15:45 not showing #jazoon http://54.217.159.208:8484/jazoon-restapi/ 9 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 10. Velocity §  Velocity requirement examples: §  §  §  §  §  §  §  §  §  §  §  §  10 Recommendation Engine Predictive Analytics Marketing Campaign Analysis Customer Retention and Churn Analysis Social Graph Analysis Capital Markets Analysis Risk Management Rogue Trading Fraud Detection Retail Banking Network Monitoring Research and Development 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 11. Agenda 1.  Introduction 2.  What is Apache Kafka? 3.  What is Twitter Storm? 4.  Summary 11 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 12. Apache Kafka •  A distributed publish-subscribe messaging system •  Designed for processing of real time activity stream data (logs, metrics collections, social media streams, …) •  Initially developed at LinkedIn, now part of Apache •  Does not follow JMS Standards and does not use JMS API •  Kafka maintains feeds of messages in topics 12 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 13. Sample (#1) 1) Use the Twitter Streaming API (Twitter Horsebird Client) to get all the tweets containing the term jazoon Twitter Stream 13 tweet Twitter to Kafka tweet 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Kafka Topic Create a Filter Stream for tracking a list of terms
  • 14. Twitter Stream Sample (#1) tweet Twitter to Kafka tweet Kafka Topic Create a fully configured Horsebird Client instance Start the client in multiple threads 14 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 15. Twitter Stream Sample (#1) tweet Twitter to Kafka tweet Kafka Topic Convert Twitter status to Avro format Create the Kafka producer Send the message to the Kafka topic 15 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 16. Agenda 1.  Introduction 2.  What is Apache Kafka? 3.  What is Twitter Storm? 4.  Summary 16 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 17. What is Twitter Storm ? A platform for doing analysis on streams of data as they come in, so you can react to data as it happens. •  A highly distributed real-time computation system •  Provides general primitives to do real-time computation •  To simplify working with queues & workers •  scalable and fault-tolerant •  complementary to Hadoop Originated at Backtype, acquired by Twitter in 2011 Open Sourced late 2011 Part of Apache Incubator since September 2013 17 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 18. Hadoop vs. Storm Hadoop Storm Batch processing Jobs runs to completion Stateful nodes Scalable Guarantees no data loss Open Source => big batch processing 18 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 != = Real-time processing Topologies run forever Stateless nodes Scalable Guarantees no data loss Open Source => Fast, reactive, real time procesing
  • 19. Idea: Simplify dealing with queue & workers Scaling is painful – queue partitioning & worker deploy Operational overhead – worker failures & queue backups No guarantees on data processing 19 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 20. What does Storm provide? §  At least once message processing §  Fault-tolerant §  Horizontal scalability §  No intermediate queues §  Less operational overhead §  „Just works“ §  Broad set of use cases §  Stream processing §  Continous computation §  Distributed RPC 20 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 21. Main Concepts - Streams Stream •  an unbounded sequence of tuples •  core abstraction in Storm •  Defined with a schema that names the 
 fields in the tuple •  Value must be serializable •  Every stream has an id T T T T T T Subscribes: C & D Emits: Source of stream A Source of stream B 21 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 T Subscribes: A Emits: C Topology •  graph where each node is a spout or bolt •  edges indicating which bolt subscribes 
 to which streams T Subscribes: A Emits: D Subscribes: A & B Emits: -
  • 22. Main Concepts – Worker, Executor, Task Worker •  executes a subset of a topology, may run one or more executors for one or more components Executor •  thread spawned by worker Task •  performs the actual data processing Worker Process Task Task 22 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Task Task
  • 23. Main Concepts - Spout A spout is the component which acts as the source of streams in a topology Generally spouts read tuples from an external source and emit them into the topology Spouts can emit more than one stream 23 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 24. Main Concepts - Bolt Bolt is doing the processing of the message •  filtering, functions, aggregations, joins, talking to databases…. •  Can do simple stream transformations •  Complex transformations often require multiple steps and therefore multiple bolts Bolts can emit one or more streams 24 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 25. Sample (#2) 2) Use a Spout to subscribe to the Kafka Topic to get the related tweets Twitter Stream tweet Twitter to Kafka tweet Kafka Topic 25 tweet Kafka
 Spout tweet 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 26. Kafka Topic Sample (#2) Tweets Kafka
 Spout tweet Use existing implementation of the Kafka Spout from storm-kafka project Create a Kafka configuration 26 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Create Kafka spout to connect to the topic
  • 27. Sample (#3) 3) Use a bolt to retrieve the hashtags for each tweet Twitter Stream tweet Twitter to Kafka tweet Kafka Topic 27 tweet Kafka
 Spout tweet 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Hashtag
 Splitter hashtag
  • 28. Kafka Topic Sample (#3) Tweets Kafka
 Spout tweet Hashtag
 Splitter hashtag Execute is called for each tuple arriving on the input stream Declares the output fields for the component and is called when bolt is created 28 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 29. Sample (#4) 4) Use a bolt to count the occurrences of hashtags into a Redis NoSQL DB Twitter Stream Tweets Twitter to Kafka Tweets Kafka Topic 29 Tweets Kafka
 Spout tweet 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Hashtag
 Splitter hashtag Hashtag
 Counter
  • 30. Sample (#4) Kafka Topic Tweets Kafka
 Spout tweet Hashtag
 Splitter hashtag Hashtag
 Counter Execute is called for each tuple arriving on the input stream Prepare is called when bolt is created 30 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 31. Main Concepts - Topology Kafka Topic Tweets Kafka
 Spout Hashtag
 Splitter Shuffle Hashtag
 Counter Fields Hashtag
 Splitter Hashtag
 Counter Each Spout or Bolt are running N instances in parallel Shuffle grouping is random grouping Fields grouping is grouped by value, such that equal value results in equal task All grouping replicates to all tasks Global grouping makes all tuples go to one task None grouping makes bolt run in the same thread as bold/spout it subscribes to Direct grouping producer (task that emits) controls which consumer will receive 31 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 32. Sample – How does it work ? My presentation „Kafka and Storm – event processing in realtime“ tomorrow 12:00 at Jazoon Zurich #jazoon #storm #kafka CU there! Kafka Topic 32 Kafka
 Spout 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Hashtag
 Splitter Hashtag
 Counter Hashtag
 Splitter Hashtag
 Counter
  • 33. Sample – How does it work ? My presentation „Kafka and Storm – event processing in realtime“ tomorrow 12:00 at Jazoon Zurich #jazoon #storm #kafka CU there! Kafka Topic 33 Kafka
 Spout 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Hashtag
 Splitter Hashtag
 Splitter jazoon storm kafka Hashtag
 Counter Hashtag
 Counter
  • 34. Sample – How does it work ? My presentation „Kafka and Storm – event processing in realtime“ tomorrow 12:00 at Jazoon Zurich #jazoon #storm #kafka CU there! Kafka Topic 34 Kafka
 Spout 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Hashtag
 Splitter Hashtag
 Splitter jazoon storm kafka Hashtag
 Counter Hashtag
 Counter INCR jazoon INCR storm INCR kafka jazoon = 1 storm = 1 kafka = 1
  • 35. Sample (#5) 5) Setup the topology with the necessary groupings (distribution) Create Stream called „tweet-stream“ Register Kafka spout and run by 1 worker Subscribe to stream „tweetstream“ using shuffle grouping Run this bolt by 2 workers Subscribe to „hashtag-splitter“ using fields grouping on field „hashtag“ 35 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 36. Trident •  High-Level abstraction on top of storm •  Makes it easier to build topologies •  Core data model is the stream •  Processed as a series of batches •  Stream is partitioned among nodes in cluster •  5 kinds of operations in Trident •  Operations that apply locally to each partition and cause no network transfer •  Repartitioning operations that don‘t change the contents •  Aggregation operations that do network transfer •  Operations on grouped streams •  Merges and Joins 36 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 37. Trident Supports •  •  •  •  •  Joins Aggregations Grouping Functions Filters Similar to Hadoop and Pig/Cascading 37 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 38. Trident Concepts - Topology Kafka Topic 38 tweet Kafka
 Spout tweet Bolt Hashtag
 Splitter 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 hashtag local Hashtag
 Normalizer hashtag groupBy Bolt Persistent
 Aggregate
  • 39. Trident Concepts - Function •  takes in a set of input fields and emits zero or more tuples as output •  fields of the output tuple are appended to the original input tuple in the stream •  If a function emits no tuples, the original input tuple is filtered out •  Otherwise the input tuple is duplicated for each output tuple 39 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 40. Agenda 1.  Introduction 2.  What is Apache Kafka? 3.  What is Twitter Storm? 4.  Summary 40 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 41. Summary Kafka Storm •  Distributed Scalable Pub/Sub system for Big Data •  Express real-time processing naturally •  Producer => Broker => Consumer of message topics •  Persists messages with ability to rewind •  Consumer decides what he as consumed so far •  Not a Hadoop/MapReduce competitor •  Supports other languages •  Hard to debug •  Object Serialization •  Didn‘t cover §  Reliability through guaranteed message processing §  Distributed RPC §  Storm cluster setup and deploy http://54.217.159.208:8484/jazoon-restapi/ 41 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 42. Lambda Architecture
 Big Data and Fast Data combined Precompute Precomputed Views information All data Batch recompute Incoming Data Serving Layer batch view batch view Merge Batch Layer Speed Layer Process stream Incremented information Realtime increment query real time view real time view Source: Marz, N. & Warren, J. (2013) Big Data. Manning. 42 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 43. Lambda Architecture
 Big Data and Fast Data combined one possible product/framework mapping Precompute Precomputed Views information All data Batch recompute Incoming Data batch view batch view Speed Layer Process stream Incremented information Realtime increment 43 Serving Layer Merge Batch Layer 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 real time view real time view query
  • 44. Resources Sample Code: https://github.com/gschmutz/jazoon-storm-twitter-sample Twitter Streaming API: https://dev.twitter.com/docs/streaming-apis Twitter Horsebird Client (HBC): https://github.com/twitter/hbc Apache Kafka: http://kafka.apache.org/ Storm Website: http://storm-project.net/ Storm Wiki: https://github.com/nathanmarz/storm/wiki Storm Doc: https://github.com/nathanmarz/storm/wiki/Documentation 44 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013
  • 45. Thank You! BASEL 45 BERN LAUSANNE ZÜRICH DÜSSELDORF Trivadis AG Guido Schmutz
 guido.schmutz@trivadis.com FRANKFURT A.M. 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

  • 46. Lambda Architecture
 Big Data and Fast Data combined Batch Layer Serving Layer Immutable data Batch View B Incoming Data C D A G Speed Layer Data Stream E 46 2013 © Trivadis Kafka and Storm – event processing in realtime 22.10.2013 Realtime View F Query