SlideShare a Scribd company logo
1 of 50
Kappa Architecture
for Event Processing
Piotr Czarnas
Querona CEO
The anatomy of event processing
What is an event
A user performed an
action in the application
A customer just ordered a
product
An event is a something that just happened
and requires a quick reaction
Information (data)
received from an external
partner
A frequent customer
ordered another product
The need for Event Streaming
ReactionQualification
High
frequency
events
• A lot of events happen
• Some of them are valuable
• Some require our reaction
• We have little time to act
How events are processed
Action
Gather events Store & forward ReactProcess
Valuable events require a reaction
Complex events
• The same event happened again
• An event connected with external
data is a different event
External
data
Complex events are high level events based on multiple data points
→ Complex events have a real business value
Complex events identification
Reaction
Simple events:
A customer
logged in
A customer
dropped a
shopping basket
Convert to a
complex event
A complex event may be identified and added to the event stream
External data
Source of events
Actions performed by users in applications
Messages from a corporate event bus (EAI)
Complex events identified by correlation of
multiple events
Row changes in databases (CDC)
Analytical advancement
Analytical advancement ladder
Businessvalue
Descriptive
analytics
Diagnostics
analytics
Predictive
analytics
Prescriptive
analytics
What has
happened?
Why did it happen?
What will happen?
What can we do to
make it happen?
Event processing value proposition
Predictive analytics
Prescriptive analytics
Learn what we can get from events
Identify and act on events
Event processing requires two processes: learning and acting
Event consumers
Data scientists & data analysts
identify valuable events
Events are consumed for learning and for performing actions
Reaction to events
Reaction to new events in the future
Events need re-reading many times
Kappa architecture
Classic Lamba Architecture
Limitations of the Lamba Architecture
• The batch layer and the speed layer require double processing
• Changes to the processing logic must be reimplemented in both
processing pipelines
• The whole view of all data is possible only by a virtual query that is
an union of the batch and the speed layer
But do we need a speed layer that is up-to-date every time?
Lamba Architecture for log monitoring
Lambda Architecture is good for log monitoring, not for business events
Lamba Architecture for CDC data synchronization
Lambda Architecture is good for keeping a copy of rows
from an OLTP database
Insert
Delete
UpdateDB
Key/store database
(Hbase/Cassandra,etc.)
Kappa Architecture
Only one processing logic!
Kappa architecture data lag
As long as the reaction time to the event is longer then
processing time, we can work with the data lag
Output table N
Output table N+1
15 min batch human reaction
lags
Kappa Architecture benefits
• Kafka is the only source
• Only one processing logic
• Multiple types of analyses possible
• New results available in a new table
Predictive analytics
Prescriptive analytics
Actionable analytics (learning + reacting) much easier
Event storage
What is Apache Kafka
Consumer 1
Consumer 2
Apache Kafka is a high throughput publish-subscribe event bus
Event publishers
System 1
System 2
Event consumers
Kafka topic
Apache Kafka partitioning
Kafka rules:
• Topics are partitioned
• Partitions are as append-only files
• Partitions distributed across nodes
• Write speed: 1 mln events / sec /
partition
• Read speed: 2 mln events / sec /
partition
Kafka topic
Apache Kafka consumer groups
Consumer 1
Consumer 2
A consumer group
All consumers in a group share a group.idOffset
Apache Kafka offset storage for a group.id
But in Kappa Architecture we do not care about offset,
we read everything again
• Event streaming consumer must
keep the last read offset for each
partition
• Offset storage is specified by
offset.storage.[topic]
• Offset stores: Zookeeper, Kafka,
custom
Waiting for new events on Apache Kafka
The consumer can still
read from partition 0
The customer has
reached the end of all
partitions and is waiting
A customer that has reached the end of an assigned partition is
waiting for new events for the duration of the „pull” timeout period
Partition 0
Partition 1
Partition 2
Partition 3
Reading events without waiting at the end of a partition
KafkaConsumer<~> consumer =...
ConsumerRecords<~> records = consumer.pool(10000);
We must stop listening to a partition when we reach the last event or
the reader will wait or consume events forever
Partition 1
Reading events the easy way (1/3): setup
Properties consumerProps = new Properties();
consumerProps.put("bootstrap.servers", "localhost:9092");
consumerProps.put("group.id", "consumer group here");
consumerProps.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
consumerProps.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
A random group.id must be used
Reading events the easy way (2/3): partition offset seek
KafkaConsumer<String, String> consumer =
new KafkaConsumer<String, String>(consumerProps);
List<PartitionInfo> partitionInfos =
consumer.partitionsFor("topic name here");
List<TopicPartition> topicPartitions =
partitionInfos.stream()
.map(pi -> new TopicPartition(pi.topic(), pi.partition()))
.collect(Collectors.toList());
consumer.assign(topicPartitions);
consumer.seekToBeginning(topicPartitions);
But we can also find offsets by a timestamp and „rewind” to it
Reading events the easy way (3/3): reading loop
Map<TopicPartition, Long> endOffsets = consumer.endOffsets(topicPartitions);
int remainingPartitionsCount = endOffsets.size();
while(remainingPartitionsCount > 0) {
ConsumerRecords<String, String> consumerRecords = consumer.poll(10000);
for (ConsumerRecord<String, String> record : consumerRecords) {
TopicPartition recordPartition = new TopicPartition(record.topic(), record.partition());
long endOffset = endOffsets.get(recordPartition);
if (record.offset() == endOffset - 1) {
remainingPartitionsCount--;
consumer.pause(Arrays.asList(recordPartition));
}
if (record.offset() < endOffset)
processRecord(record);
}
if (consumerRecords.isEmpty())
break;
}
Bounded event reading on Apache Spark
1. Create a custom RDD or Dataframe that
reads from Apache Kafka
2. Register your RDD in the context
3. Just run SQL on the DataFrame
Bounded Spark RDD (1/6): RDD declaration
public static class KafkaTopicRDD extends org.apache.spark.rdd.RDD<String> {
private static final ClassTag<String> STRING_TAG =
ClassManifestFactory$.MODULE$.fromClass(String.class);
private static final long serialVersionUID = 1L;
private String kafkaServer;
private String groupId;
private String topic;
private long timeout;
public KafkaTopicRDD(SparkContext sc, String kafkaServer, String groupId, String topic,
long timeout) {
super(sc, new ArrayBuffer<Dependency<?>>(), STRING_TAG);
this.kafkaServer = kafkaServer;
this.groupId = groupId;
this.topic = topic;
this.timeout = timeout;
}
Bounded Spark RDD (2/6): RDD’s compute
@Override
public Iterator<String> compute(Partition arg0, TaskContext arg1) {
KafkaTopicPartition p = (KafkaTopicPartition)arg0;
KafkaConsumer<String, String> kafkaConsumer = createKafkaConsumer();
TopicPartition partition = new TopicPartition(topic, p.partition);
kafkaConsumer.assign(Arrays.asList(partition));
kafkaConsumer.seek(partition, p.startOffset);
return new KafkaTopicIterator(kafkaConsumer, p.endOffset, this.timeout);
}
private KafkaConsumer<String, String> createKafkaConsumer() {
Properties consumerProps = new Properties();
consumerProps.put("bootstrap.servers", this.kafkaServer);
consumerProps.put("group.id", this.groupId);
consumerProps.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
consumerProps.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(consumerProps);
return consumer;
}
Each Kafka’s partition is processed as a separate task
Bounded Spark RDD (3/6): Kafka → Spark partition
public static class KafkaTopicPartition implements Partition {
private static final long serialVersionUID = 1L;
private int partition;
private long startOffset;
private long endOffset;
public KafkaTopicPartition(int partition, long startOffset, long endOffset) {
this.partition = partition;
this.startOffset = startOffset;
this.endOffset = endOffset;
}
@Override
public int index() { return partition; }
@Override
public boolean equals(Object obj) { return ... }
@Override
public int hashCode() { return index(); }
}
Bounded Spark RDD (4/6): partition enumeration
@Override
public Partition[] getPartitions() {
KafkaConsumer<String, String> kafkaConsumer = createKafkaConsumer();
List<PartitionInfo> partitionInfos = kafkaConsumer.partitionsFor(this.topic);
List<TopicPartition> topicPartitions = partitionInfos.stream()
.map(pi -> new TopicPartition(this.topic, pi.partition())).collect(Collectors.toList());
Map<TopicPartition, Long> beginOffsets = kafkaConsumer.beginningOffsets(topicPartitions);
Map<TopicPartition, Long> endOffsets = kafkaConsumer.endOffsets(topicPartitions);
Partition[] partitions = new Partition[partitionInfos.size()];
for(int i = 0; i < partitionInfos.size(); i++) {
PartitionInfo partitionInfo = partitionInfos.get(i);
TopicPartition topicPartition = topicPartitions.get(i);
partitions[i] = new KafkaTopicPartition(
partitionInfo.partition(),
beginOffsets.get(topicPartition),
endOffsets.get(topicPartition));
}
return partitions;
}
Bounded Spark RDD (5/6): events iterator
public static class KafkaTopicIterator extends AbstractIterator<String> {
private KafkaConsumer<String, String> kafkaConsumer;
private long endOffset, timeout;
private ConsumerRecords<String, String> recordsBatch;
private java.util.Iterator<ConsumerRecord<String, String>> recordIterator;
private ConsumerRecord<String, String> currentRecord;
private boolean lastRecordReached;
public KafkaTopicIterator(KafkaConsumer<String, String> kafkaConsumer, long endOffset, long timeout) {
this.kafkaConsumer = kafkaConsumer; this.endOffset = endOffset; this.timeout = timeout;
}
@Override
public String next() {
if (currentRecord == null)
hasNext();
String value = currentRecord.value();
currentRecord = null;
return value;
}
Bounded Spark RDD (6/6): iterator’s hasNext
@Override
public boolean hasNext() {
if (currentRecord != null) return true;
if (lastRecordReached) return false;
if (recordsBatch == null) {
recordsBatch = this.kafkaConsumer.poll(this.timeout);
recordIterator = recordsBatch.iterator();
}
if (!recordIterator.hasNext()) return false;
currentRecord = recordIterator.next();
if (currentRecord.offset() >= endOffset) {
currentRecord = null;
return false;
}
if (currentRecord.offset() >= endOffset - 1)
lastRecordReached = true;
return true;
}
The anatomy of a Kafka event
Key Value
• Records in Kafka have a key and a value
• Both key and the value are binary and serialized
by a serializer of choice
• JSON, String or AVRO serializers are usefull
Apache Kafka log compaction
The default (delete) log cleanup
policy removes old entries
„compact” cleanup policy keeps the
newest version of a record for each key
1 week
log.cleanup.policy=delete
Compact cleanup policy is required for Kappa Architecture
1 2 53 4 2 3 6 2 7
log.cleanup.policy=compact
1 2 53 4 2 3 6 2 7
Querona
Your existing data sources
ETL-free virtual database,
Apache Spark powered
User-friendly front-ends
keep it as it is to reduce risk
set up to facilitate &
accelerate BI
use what users know for years
Querona – Data Virtualization engine
Complete Logical Data Warehouse: ETL-free, self-service, Big Data ready, utilizing Apache Spark.
Data sources
CRM
ERP
OLTP
Client tools
Connects all data sources (~100)
Simple data loading (3 clicks)
Joins data from many sources (instant)
Real-time data access
Enables GDPR/RODO compliance
QUERONA – Logical Data Warehouse
Why Querona
Data Virtualization (DV) is not a new idea but since 2016 Garther has
considered DV as a key trend in Data Warehousing and Data Analytics
• Self-service → more people can use data
• SQL Server wire compatibility → compatible with any client tool
• Apache Spark bundled → „Big Data Ready” in 5 minutes
• Competitive licensing model → DV available for all companies
Which table has
first names?
Find any data source
What do we
have here?
Data preview in one place
Maybe we can
correlate that
with events?
The data source
not capable of
real-time access?
Caching – just a few clicks
Let’s cache it on
Apache Spark or
in the cloud
More information
about an event
are in a CRM?
Joining data
Let’s build a 360°
customer profile
as an SQL view!
Augmented events
Original events (Kafka)
Augmented events visible as SQL Server compatible views
V_EVENTS_CUSTOMER_INFO
V_EVENTS_PRODUCT_INFO
V_EVENT_SALES
CRM
Product database
ERP
V_EVENT_CAMPAIGN_GOALS
Marketing platform
External data sources for event augmentation
Social mediaSaaS
Business
partner’s
database
Partners Public data
Kappa Architecture full data lifecycle rules
• Treat Apache Kafka as a persistent event source
• Get ready for both event analysis (learn) and reacting to
events (act)
• Identify all additional data that may augment events
• Make sure that you can reprocess events at any time
• Expose complex events for consumption (dashboards,
activities created in CRM, etc.)
Piotr Czarnas
CEO
Querona Ltd.
piotr.czarnas@querona.com
+48 536 133 114
www.querona.com

More Related Content

What's hot

Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with SparkKnoldus Inc.
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesSingleStore
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationshadooparchbook
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureDatabricks
 
Using Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureUsing Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureOliver Buckley-Salmon
 
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiSMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiCodemotion Dubai
 
Kick-Start with SMACK Stack
Kick-Start with SMACK StackKick-Start with SMACK Stack
Kick-Start with SMACK StackKnoldus Inc.
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 
Cloud native data platform
Cloud native data platformCloud native data platform
Cloud native data platformLi Gao
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streamingdatamantra
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...HostedbyConfluent
 
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Databricks
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluDataWorks Summit
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...DataStax Academy
 

What's hot (20)

Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 
Using Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureUsing Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architecture
 
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiSMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
Kick-Start with SMACK Stack
Kick-Start with SMACK StackKick-Start with SMACK Stack
Kick-Start with SMACK Stack
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Cloud native data platform
Cloud native data platformCloud native data platform
Cloud native data platform
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
 
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at Hulu
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 

Similar to Kappa Architecture on Apache Kafka and Querona: datamass.io

Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317Nan Zhu
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Timothy Spann
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...Scala Italy
 
Scala in increasingly demanding environments - DATABIZ
Scala in increasingly demanding environments - DATABIZScala in increasingly demanding environments - DATABIZ
Scala in increasingly demanding environments - DATABIZDATABIZit
 
Using MongoDB with Kafka - Use Cases and Best Practices
Using MongoDB with Kafka -  Use Cases and Best PracticesUsing MongoDB with Kafka -  Use Cases and Best Practices
Using MongoDB with Kafka - Use Cases and Best PracticesAntonios Giannopoulos
 
Kafka indexing service
Kafka indexing serviceKafka indexing service
Kafka indexing serviceSeoeun Park
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising TechApache Apex
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesYaroslav Tkachenko
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Amazon Web Services
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Databricks
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxpetabridge
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksAndrii Gakhov
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?confluent
 
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraAvoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraLuke Tillman
 

Similar to Kappa Architecture on Apache Kafka and Querona: datamass.io (20)

Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
 
Scala in increasingly demanding environments - DATABIZ
Scala in increasingly demanding environments - DATABIZScala in increasingly demanding environments - DATABIZ
Scala in increasingly demanding environments - DATABIZ
 
Using MongoDB with Kafka - Use Cases and Best Practices
Using MongoDB with Kafka -  Use Cases and Best PracticesUsing MongoDB with Kafka -  Use Cases and Best Practices
Using MongoDB with Kafka - Use Cases and Best Practices
 
Kafka indexing service
Kafka indexing serviceKafka indexing service
Kafka indexing service
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising Tech
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event Architectures
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected Talks
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraAvoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
 

Recently uploaded

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 

Recently uploaded (20)

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 

Kappa Architecture on Apache Kafka and Querona: datamass.io

  • 1. Kappa Architecture for Event Processing Piotr Czarnas Querona CEO
  • 2. The anatomy of event processing
  • 3. What is an event A user performed an action in the application A customer just ordered a product An event is a something that just happened and requires a quick reaction Information (data) received from an external partner A frequent customer ordered another product
  • 4. The need for Event Streaming ReactionQualification High frequency events • A lot of events happen • Some of them are valuable • Some require our reaction • We have little time to act
  • 5. How events are processed Action Gather events Store & forward ReactProcess Valuable events require a reaction
  • 6. Complex events • The same event happened again • An event connected with external data is a different event External data Complex events are high level events based on multiple data points → Complex events have a real business value
  • 7. Complex events identification Reaction Simple events: A customer logged in A customer dropped a shopping basket Convert to a complex event A complex event may be identified and added to the event stream External data
  • 8. Source of events Actions performed by users in applications Messages from a corporate event bus (EAI) Complex events identified by correlation of multiple events Row changes in databases (CDC)
  • 9. Analytical advancement Analytical advancement ladder Businessvalue Descriptive analytics Diagnostics analytics Predictive analytics Prescriptive analytics What has happened? Why did it happen? What will happen? What can we do to make it happen?
  • 10. Event processing value proposition Predictive analytics Prescriptive analytics Learn what we can get from events Identify and act on events Event processing requires two processes: learning and acting
  • 11. Event consumers Data scientists & data analysts identify valuable events Events are consumed for learning and for performing actions Reaction to events Reaction to new events in the future Events need re-reading many times
  • 14. Limitations of the Lamba Architecture • The batch layer and the speed layer require double processing • Changes to the processing logic must be reimplemented in both processing pipelines • The whole view of all data is possible only by a virtual query that is an union of the batch and the speed layer But do we need a speed layer that is up-to-date every time?
  • 15. Lamba Architecture for log monitoring Lambda Architecture is good for log monitoring, not for business events
  • 16. Lamba Architecture for CDC data synchronization Lambda Architecture is good for keeping a copy of rows from an OLTP database Insert Delete UpdateDB Key/store database (Hbase/Cassandra,etc.)
  • 17. Kappa Architecture Only one processing logic!
  • 18. Kappa architecture data lag As long as the reaction time to the event is longer then processing time, we can work with the data lag Output table N Output table N+1 15 min batch human reaction lags
  • 19. Kappa Architecture benefits • Kafka is the only source • Only one processing logic • Multiple types of analyses possible • New results available in a new table Predictive analytics Prescriptive analytics Actionable analytics (learning + reacting) much easier
  • 21. What is Apache Kafka Consumer 1 Consumer 2 Apache Kafka is a high throughput publish-subscribe event bus Event publishers System 1 System 2 Event consumers Kafka topic
  • 22. Apache Kafka partitioning Kafka rules: • Topics are partitioned • Partitions are as append-only files • Partitions distributed across nodes • Write speed: 1 mln events / sec / partition • Read speed: 2 mln events / sec / partition Kafka topic
  • 23. Apache Kafka consumer groups Consumer 1 Consumer 2 A consumer group All consumers in a group share a group.idOffset
  • 24. Apache Kafka offset storage for a group.id But in Kappa Architecture we do not care about offset, we read everything again • Event streaming consumer must keep the last read offset for each partition • Offset storage is specified by offset.storage.[topic] • Offset stores: Zookeeper, Kafka, custom
  • 25. Waiting for new events on Apache Kafka The consumer can still read from partition 0 The customer has reached the end of all partitions and is waiting A customer that has reached the end of an assigned partition is waiting for new events for the duration of the „pull” timeout period Partition 0 Partition 1 Partition 2 Partition 3
  • 26. Reading events without waiting at the end of a partition KafkaConsumer<~> consumer =... ConsumerRecords<~> records = consumer.pool(10000); We must stop listening to a partition when we reach the last event or the reader will wait or consume events forever Partition 1
  • 27. Reading events the easy way (1/3): setup Properties consumerProps = new Properties(); consumerProps.put("bootstrap.servers", "localhost:9092"); consumerProps.put("group.id", "consumer group here"); consumerProps.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); consumerProps.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); A random group.id must be used
  • 28. Reading events the easy way (2/3): partition offset seek KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(consumerProps); List<PartitionInfo> partitionInfos = consumer.partitionsFor("topic name here"); List<TopicPartition> topicPartitions = partitionInfos.stream() .map(pi -> new TopicPartition(pi.topic(), pi.partition())) .collect(Collectors.toList()); consumer.assign(topicPartitions); consumer.seekToBeginning(topicPartitions); But we can also find offsets by a timestamp and „rewind” to it
  • 29. Reading events the easy way (3/3): reading loop Map<TopicPartition, Long> endOffsets = consumer.endOffsets(topicPartitions); int remainingPartitionsCount = endOffsets.size(); while(remainingPartitionsCount > 0) { ConsumerRecords<String, String> consumerRecords = consumer.poll(10000); for (ConsumerRecord<String, String> record : consumerRecords) { TopicPartition recordPartition = new TopicPartition(record.topic(), record.partition()); long endOffset = endOffsets.get(recordPartition); if (record.offset() == endOffset - 1) { remainingPartitionsCount--; consumer.pause(Arrays.asList(recordPartition)); } if (record.offset() < endOffset) processRecord(record); } if (consumerRecords.isEmpty()) break; }
  • 30. Bounded event reading on Apache Spark 1. Create a custom RDD or Dataframe that reads from Apache Kafka 2. Register your RDD in the context 3. Just run SQL on the DataFrame
  • 31. Bounded Spark RDD (1/6): RDD declaration public static class KafkaTopicRDD extends org.apache.spark.rdd.RDD<String> { private static final ClassTag<String> STRING_TAG = ClassManifestFactory$.MODULE$.fromClass(String.class); private static final long serialVersionUID = 1L; private String kafkaServer; private String groupId; private String topic; private long timeout; public KafkaTopicRDD(SparkContext sc, String kafkaServer, String groupId, String topic, long timeout) { super(sc, new ArrayBuffer<Dependency<?>>(), STRING_TAG); this.kafkaServer = kafkaServer; this.groupId = groupId; this.topic = topic; this.timeout = timeout; }
  • 32. Bounded Spark RDD (2/6): RDD’s compute @Override public Iterator<String> compute(Partition arg0, TaskContext arg1) { KafkaTopicPartition p = (KafkaTopicPartition)arg0; KafkaConsumer<String, String> kafkaConsumer = createKafkaConsumer(); TopicPartition partition = new TopicPartition(topic, p.partition); kafkaConsumer.assign(Arrays.asList(partition)); kafkaConsumer.seek(partition, p.startOffset); return new KafkaTopicIterator(kafkaConsumer, p.endOffset, this.timeout); } private KafkaConsumer<String, String> createKafkaConsumer() { Properties consumerProps = new Properties(); consumerProps.put("bootstrap.servers", this.kafkaServer); consumerProps.put("group.id", this.groupId); consumerProps.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); consumerProps.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(consumerProps); return consumer; } Each Kafka’s partition is processed as a separate task
  • 33. Bounded Spark RDD (3/6): Kafka → Spark partition public static class KafkaTopicPartition implements Partition { private static final long serialVersionUID = 1L; private int partition; private long startOffset; private long endOffset; public KafkaTopicPartition(int partition, long startOffset, long endOffset) { this.partition = partition; this.startOffset = startOffset; this.endOffset = endOffset; } @Override public int index() { return partition; } @Override public boolean equals(Object obj) { return ... } @Override public int hashCode() { return index(); } }
  • 34. Bounded Spark RDD (4/6): partition enumeration @Override public Partition[] getPartitions() { KafkaConsumer<String, String> kafkaConsumer = createKafkaConsumer(); List<PartitionInfo> partitionInfos = kafkaConsumer.partitionsFor(this.topic); List<TopicPartition> topicPartitions = partitionInfos.stream() .map(pi -> new TopicPartition(this.topic, pi.partition())).collect(Collectors.toList()); Map<TopicPartition, Long> beginOffsets = kafkaConsumer.beginningOffsets(topicPartitions); Map<TopicPartition, Long> endOffsets = kafkaConsumer.endOffsets(topicPartitions); Partition[] partitions = new Partition[partitionInfos.size()]; for(int i = 0; i < partitionInfos.size(); i++) { PartitionInfo partitionInfo = partitionInfos.get(i); TopicPartition topicPartition = topicPartitions.get(i); partitions[i] = new KafkaTopicPartition( partitionInfo.partition(), beginOffsets.get(topicPartition), endOffsets.get(topicPartition)); } return partitions; }
  • 35. Bounded Spark RDD (5/6): events iterator public static class KafkaTopicIterator extends AbstractIterator<String> { private KafkaConsumer<String, String> kafkaConsumer; private long endOffset, timeout; private ConsumerRecords<String, String> recordsBatch; private java.util.Iterator<ConsumerRecord<String, String>> recordIterator; private ConsumerRecord<String, String> currentRecord; private boolean lastRecordReached; public KafkaTopicIterator(KafkaConsumer<String, String> kafkaConsumer, long endOffset, long timeout) { this.kafkaConsumer = kafkaConsumer; this.endOffset = endOffset; this.timeout = timeout; } @Override public String next() { if (currentRecord == null) hasNext(); String value = currentRecord.value(); currentRecord = null; return value; }
  • 36. Bounded Spark RDD (6/6): iterator’s hasNext @Override public boolean hasNext() { if (currentRecord != null) return true; if (lastRecordReached) return false; if (recordsBatch == null) { recordsBatch = this.kafkaConsumer.poll(this.timeout); recordIterator = recordsBatch.iterator(); } if (!recordIterator.hasNext()) return false; currentRecord = recordIterator.next(); if (currentRecord.offset() >= endOffset) { currentRecord = null; return false; } if (currentRecord.offset() >= endOffset - 1) lastRecordReached = true; return true; }
  • 37. The anatomy of a Kafka event Key Value • Records in Kafka have a key and a value • Both key and the value are binary and serialized by a serializer of choice • JSON, String or AVRO serializers are usefull
  • 38. Apache Kafka log compaction The default (delete) log cleanup policy removes old entries „compact” cleanup policy keeps the newest version of a record for each key 1 week log.cleanup.policy=delete Compact cleanup policy is required for Kappa Architecture 1 2 53 4 2 3 6 2 7 log.cleanup.policy=compact 1 2 53 4 2 3 6 2 7
  • 40. Your existing data sources ETL-free virtual database, Apache Spark powered User-friendly front-ends keep it as it is to reduce risk set up to facilitate & accelerate BI use what users know for years Querona – Data Virtualization engine Complete Logical Data Warehouse: ETL-free, self-service, Big Data ready, utilizing Apache Spark.
  • 41. Data sources CRM ERP OLTP Client tools Connects all data sources (~100) Simple data loading (3 clicks) Joins data from many sources (instant) Real-time data access Enables GDPR/RODO compliance QUERONA – Logical Data Warehouse
  • 42. Why Querona Data Virtualization (DV) is not a new idea but since 2016 Garther has considered DV as a key trend in Data Warehousing and Data Analytics • Self-service → more people can use data • SQL Server wire compatibility → compatible with any client tool • Apache Spark bundled → „Big Data Ready” in 5 minutes • Competitive licensing model → DV available for all companies
  • 43. Which table has first names? Find any data source
  • 44. What do we have here? Data preview in one place Maybe we can correlate that with events?
  • 45. The data source not capable of real-time access? Caching – just a few clicks Let’s cache it on Apache Spark or in the cloud
  • 46. More information about an event are in a CRM? Joining data Let’s build a 360° customer profile as an SQL view!
  • 47. Augmented events Original events (Kafka) Augmented events visible as SQL Server compatible views V_EVENTS_CUSTOMER_INFO V_EVENTS_PRODUCT_INFO V_EVENT_SALES CRM Product database ERP V_EVENT_CAMPAIGN_GOALS Marketing platform
  • 48. External data sources for event augmentation Social mediaSaaS Business partner’s database Partners Public data
  • 49. Kappa Architecture full data lifecycle rules • Treat Apache Kafka as a persistent event source • Get ready for both event analysis (learn) and reacting to events (act) • Identify all additional data that may augment events • Make sure that you can reprocess events at any time • Expose complex events for consumption (dashboards, activities created in CRM, etc.)