SlideShare a Scribd company logo
1 of 8
Download to read offline
Solution for events logging
How does Akka Streams in couple with Kafka make it easier to manage
data flows
2Sementsov A., 2020
Architecture: Step 1
• Client communicate with server using Kafka broker
• Consumer writes each record in both storages: PostgreSQL and
Hadoop (possible Hive)
• PostgreSQL used for operational data with maximum storage
period of about 1-3 months. It is the storage with fast search
capability
• Hadoop and related components used as cheaper storage, but
with slower access
Bird's eye view
Challenges
1. It is difficult to maintain consistency when writing to multiple
repositories simultaneously
2. Consumer must use existing access rights provider, monitoring
and logging system
3. How to reduce the amount of developed code and speed up
the process of adding new repositories
4. How to choose the optimal storage format from all the
possible options offered by the Hadoop and related products
5. No vendor lock-in allowed
3Sementsov A., 2020
Architecture: Step 2
Finish solution
1. Consistency – Kafka provides functionality to create several groups
of consumers. Each group process all messages independently
2. There are many tools such as Fluent, LogStash, Flume provides “get
-> process -> put” functionality. However for two main reasons
(Refusal to vendor lock-in and integration with our proprietary
systems) we decided to develop our own
3. We chose Akka Streams library to reduce the amount of developed
code and speed up the process of adding new storages
4. Hadoop write stages
1. Consumer service writes files directly to HDFS in Apache Avro
format. Pros Avro - fast write and compression. Cons – it has
no indexes, so the search is very slow
2. The Apache Oozie plans a task to convert Avro files into ORC
table using Hive. Apache ORC files has several advantages:
• Has indexes
• Allowed columnar encryption and/or masking
Cons
• File can be written only once because indexes must be
added at the end of file
5. Both PostgreSQL and Hive have JDBC drivers to access data
4Sementsov A., 2020
Akka Streams: Consumer application architecture
Consumer application
• HTTP endpoints protected by desired access provider.
Endpoint developed using Akka Http
• HTTP endpoint backed by KafkaProcessor Actor. Actor
can process two type of messages – Start and Stop
• KafkaProcessor can start & stops several kind of streams:
• PostgresFlow
• HdfsAvroFlow
• Trait BaseKafkaFlow provides common functions:
• Logging & monitoring functionality
• Get messages from and commit to Kafka
• Extension points to parse and store messages
• Each of final streams extends BaseKafkaFlow trait and
implements flow stages for
• Parser
• Store
5Sementsov A., 2020
Akka Streams coding: Base classes
These two traits – BaseProcessorFlow and BaseKafkaFlow provide functions which
cover entire processing procedure.
Type parameters In and P allowed to use different sources and process different
messages
For BaseKafkaFlow source type is KafkaMessage, which defined as
BaseProcessorFlow has two abstract values
• parse – will be used to convert incoming value into PassThrough[In, P]
• saveBatch – will be used to store batch data and transfer all processed data to
downstream. This stage can be cause of backpressure
BaseKafkaFlow extends base processor by adding ability to receive and send
messages. BaseKafkaFlow is an Actor. Also this class have states.
BaseKafkaFlow can process the following messages:
• StartKafkaConsumer(startConfig) – Message contains parameters to create and
start stream connected to Kafka brokers. Stream can read message from Kafka,
processing its and committing offset to Kafka partitions when messages
processes successfully. Moreover, startConfig contains information about
consumer group.
• StopKafkaConsumer - Message used to stop all streams gracefully
Main application actor can start many actors of different classes derived from
BaseKafkaFlow
6Sementsov A., 2020
Akka Streams coding: Storages
PostgresFlow implements parser to construct SomeEvent object stored as JSON. If
JSON parsing was not success then constructs SomeEvent object which contains
error.
saveBatch stores data into PostgreSQL database. In the case when the data cannot
be saved due to a database error, this function has repeatedly tried to save the
data. To avoid CPU overload, it has an ExponentialBackoff to control the time
between attempts. The whole batch is saved in one transaction in function
saveEvents
7Sementsov A., 2020
Akka Streams coding: Storages
To store Avro files we need more complex program code. For this purpose using
custom flow processing. HdfsAvroFileFlow defines FlowStage
HdfsAvroFileFlowLogic defines processing logic. It is necessary because we need
state to store output avro stream. One stream = one file in HDFS. In fact this logic is
very simple. Open file is it not open and write file if it open. From time to time, the
file is forced to close, rotation occurs
writeAvroEvents function writes data and pass successfully written data to the
downstream. Due to this action only data which actually written to HDFS will be
committed to Kafka
However, there is one point to consider. Kafka stream will not reread uncommitted
records. To do this we need to restart stream. It can be done either send special
message to actor or by throwing Exception during this stage
8Sementsov A., 2020
Links
• Akka main site - https://akka.io/
• Akka Streams - https://doc.akka.io/docs/akka/current/stream/index.htm
• Author email -anatolse@gmail.com

More Related Content

What's hot

Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsApache Apex
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streamingdatamantra
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafkadatamantra
 
So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector? confluent
 
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...confluent
 
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanStreamNative
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streamingdatamantra
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changesconfluent
 
Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016
Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016
Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016Esther Kundin
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Knoldus Inc.
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaAbhinav Singh
 

What's hot (20)

Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
 
Data Integration
Data IntegrationData Integration
Data Integration
 
So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector?
 
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
 
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! Japan
 
A Short Presentation on Kafka
A Short Presentation on KafkaA Short Presentation on Kafka
A Short Presentation on Kafka
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changes
 
Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016
Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016
Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Apache Kafka Streams
Apache Kafka StreamsApache Kafka Streams
Apache Kafka Streams
 
Cloudera's Flume
Cloudera's FlumeCloudera's Flume
Cloudera's Flume
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
 
Archive integration with RDF
Archive integration with RDFArchive integration with RDF
Archive integration with RDF
 

Similar to Solution for events logging with akka streams and kafka

Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataNaveen Korakoppa
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Data Con LA
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedEdureka!
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonClaudiu Barbura
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streamingphanleson
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaMax Alexejev
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis
 

Similar to Solution for events logging with akka streams and kafka (20)

Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast Data
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
Event driven-arch
Event driven-archEvent driven-arch
Event driven-arch
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and Scala
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 

Recently uploaded

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 

Recently uploaded (20)

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 

Solution for events logging with akka streams and kafka

  • 1. Solution for events logging How does Akka Streams in couple with Kafka make it easier to manage data flows
  • 2. 2Sementsov A., 2020 Architecture: Step 1 • Client communicate with server using Kafka broker • Consumer writes each record in both storages: PostgreSQL and Hadoop (possible Hive) • PostgreSQL used for operational data with maximum storage period of about 1-3 months. It is the storage with fast search capability • Hadoop and related components used as cheaper storage, but with slower access Bird's eye view Challenges 1. It is difficult to maintain consistency when writing to multiple repositories simultaneously 2. Consumer must use existing access rights provider, monitoring and logging system 3. How to reduce the amount of developed code and speed up the process of adding new repositories 4. How to choose the optimal storage format from all the possible options offered by the Hadoop and related products 5. No vendor lock-in allowed
  • 3. 3Sementsov A., 2020 Architecture: Step 2 Finish solution 1. Consistency – Kafka provides functionality to create several groups of consumers. Each group process all messages independently 2. There are many tools such as Fluent, LogStash, Flume provides “get -> process -> put” functionality. However for two main reasons (Refusal to vendor lock-in and integration with our proprietary systems) we decided to develop our own 3. We chose Akka Streams library to reduce the amount of developed code and speed up the process of adding new storages 4. Hadoop write stages 1. Consumer service writes files directly to HDFS in Apache Avro format. Pros Avro - fast write and compression. Cons – it has no indexes, so the search is very slow 2. The Apache Oozie plans a task to convert Avro files into ORC table using Hive. Apache ORC files has several advantages: • Has indexes • Allowed columnar encryption and/or masking Cons • File can be written only once because indexes must be added at the end of file 5. Both PostgreSQL and Hive have JDBC drivers to access data
  • 4. 4Sementsov A., 2020 Akka Streams: Consumer application architecture Consumer application • HTTP endpoints protected by desired access provider. Endpoint developed using Akka Http • HTTP endpoint backed by KafkaProcessor Actor. Actor can process two type of messages – Start and Stop • KafkaProcessor can start & stops several kind of streams: • PostgresFlow • HdfsAvroFlow • Trait BaseKafkaFlow provides common functions: • Logging & monitoring functionality • Get messages from and commit to Kafka • Extension points to parse and store messages • Each of final streams extends BaseKafkaFlow trait and implements flow stages for • Parser • Store
  • 5. 5Sementsov A., 2020 Akka Streams coding: Base classes These two traits – BaseProcessorFlow and BaseKafkaFlow provide functions which cover entire processing procedure. Type parameters In and P allowed to use different sources and process different messages For BaseKafkaFlow source type is KafkaMessage, which defined as BaseProcessorFlow has two abstract values • parse – will be used to convert incoming value into PassThrough[In, P] • saveBatch – will be used to store batch data and transfer all processed data to downstream. This stage can be cause of backpressure BaseKafkaFlow extends base processor by adding ability to receive and send messages. BaseKafkaFlow is an Actor. Also this class have states. BaseKafkaFlow can process the following messages: • StartKafkaConsumer(startConfig) – Message contains parameters to create and start stream connected to Kafka brokers. Stream can read message from Kafka, processing its and committing offset to Kafka partitions when messages processes successfully. Moreover, startConfig contains information about consumer group. • StopKafkaConsumer - Message used to stop all streams gracefully Main application actor can start many actors of different classes derived from BaseKafkaFlow
  • 6. 6Sementsov A., 2020 Akka Streams coding: Storages PostgresFlow implements parser to construct SomeEvent object stored as JSON. If JSON parsing was not success then constructs SomeEvent object which contains error. saveBatch stores data into PostgreSQL database. In the case when the data cannot be saved due to a database error, this function has repeatedly tried to save the data. To avoid CPU overload, it has an ExponentialBackoff to control the time between attempts. The whole batch is saved in one transaction in function saveEvents
  • 7. 7Sementsov A., 2020 Akka Streams coding: Storages To store Avro files we need more complex program code. For this purpose using custom flow processing. HdfsAvroFileFlow defines FlowStage HdfsAvroFileFlowLogic defines processing logic. It is necessary because we need state to store output avro stream. One stream = one file in HDFS. In fact this logic is very simple. Open file is it not open and write file if it open. From time to time, the file is forced to close, rotation occurs writeAvroEvents function writes data and pass successfully written data to the downstream. Due to this action only data which actually written to HDFS will be committed to Kafka However, there is one point to consider. Kafka stream will not reread uncommitted records. To do this we need to restart stream. It can be done either send special message to actor or by throwing Exception during this stage
  • 8. 8Sementsov A., 2020 Links • Akka main site - https://akka.io/ • Akka Streams - https://doc.akka.io/docs/akka/current/stream/index.htm • Author email -anatolse@gmail.com