SlideShare a Scribd company logo
Solution for events logging
How does Akka Streams in couple with Kafka make it easier to manage
data flows
2Sementsov A., 2020
Architecture: Step 1
• Client communicate with server using Kafka broker
• Consumer writes each record in both storages: PostgreSQL and
Hadoop (possible Hive)
• PostgreSQL used for operational data with maximum storage
period of about 1-3 months. It is the storage with fast search
capability
• Hadoop and related components used as cheaper storage, but
with slower access
Bird's eye view
Challenges
1. It is difficult to maintain consistency when writing to multiple
repositories simultaneously
2. Consumer must use existing access rights provider, monitoring
and logging system
3. How to reduce the amount of developed code and speed up
the process of adding new repositories
4. How to choose the optimal storage format from all the
possible options offered by the Hadoop and related products
5. No vendor lock-in allowed
3Sementsov A., 2020
Architecture: Step 2
Finish solution
1. Consistency – Kafka provides functionality to create several groups
of consumers. Each group process all messages independently
2. There are many tools such as Fluent, LogStash, Flume provides “get
-> process -> put” functionality. However for two main reasons
(Refusal to vendor lock-in and integration with our proprietary
systems) we decided to develop our own
3. We chose Akka Streams library to reduce the amount of developed
code and speed up the process of adding new storages
4. Hadoop write stages
1. Consumer service writes files directly to HDFS in Apache Avro
format. Pros Avro - fast write and compression. Cons – it has
no indexes, so the search is very slow
2. The Apache Oozie plans a task to convert Avro files into ORC
table using Hive. Apache ORC files has several advantages:
• Has indexes
• Allowed columnar encryption and/or masking
Cons
• File can be written only once because indexes must be
added at the end of file
5. Both PostgreSQL and Hive have JDBC drivers to access data
4Sementsov A., 2020
Akka Streams: Consumer application architecture
Consumer application
• HTTP endpoints protected by desired access provider.
Endpoint developed using Akka Http
• HTTP endpoint backed by KafkaProcessor Actor. Actor
can process two type of messages – Start and Stop
• KafkaProcessor can start & stops several kind of streams:
• PostgresFlow
• HdfsAvroFlow
• Trait BaseKafkaFlow provides common functions:
• Logging & monitoring functionality
• Get messages from and commit to Kafka
• Extension points to parse and store messages
• Each of final streams extends BaseKafkaFlow trait and
implements flow stages for
• Parser
• Store
5Sementsov A., 2020
Akka Streams coding: Base classes
These two traits – BaseProcessorFlow and BaseKafkaFlow provide functions which
cover entire processing procedure.
Type parameters In and P allowed to use different sources and process different
messages
For BaseKafkaFlow source type is KafkaMessage, which defined as
BaseProcessorFlow has two abstract values
• parse – will be used to convert incoming value into PassThrough[In, P]
• saveBatch – will be used to store batch data and transfer all processed data to
downstream. This stage can be cause of backpressure
BaseKafkaFlow extends base processor by adding ability to receive and send
messages. BaseKafkaFlow is an Actor. Also this class have states.
BaseKafkaFlow can process the following messages:
• StartKafkaConsumer(startConfig) – Message contains parameters to create and
start stream connected to Kafka brokers. Stream can read message from Kafka,
processing its and committing offset to Kafka partitions when messages
processes successfully. Moreover, startConfig contains information about
consumer group.
• StopKafkaConsumer - Message used to stop all streams gracefully
Main application actor can start many actors of different classes derived from
BaseKafkaFlow
6Sementsov A., 2020
Akka Streams coding: Storages
PostgresFlow implements parser to construct SomeEvent object stored as JSON. If
JSON parsing was not success then constructs SomeEvent object which contains
error.
saveBatch stores data into PostgreSQL database. In the case when the data cannot
be saved due to a database error, this function has repeatedly tried to save the
data. To avoid CPU overload, it has an ExponentialBackoff to control the time
between attempts. The whole batch is saved in one transaction in function
saveEvents
7Sementsov A., 2020
Akka Streams coding: Storages
To store Avro files we need more complex program code. For this purpose using
custom flow processing. HdfsAvroFileFlow defines FlowStage
HdfsAvroFileFlowLogic defines processing logic. It is necessary because we need
state to store output avro stream. One stream = one file in HDFS. In fact this logic is
very simple. Open file is it not open and write file if it open. From time to time, the
file is forced to close, rotation occurs
writeAvroEvents function writes data and pass successfully written data to the
downstream. Due to this action only data which actually written to HDFS will be
committed to Kafka
However, there is one point to consider. Kafka stream will not reread uncommitted
records. To do this we need to restart stream. It can be done either send special
message to actor or by throwing Exception during this stage
8Sementsov A., 2020
Links
• Akka main site - https://akka.io/
• Akka Streams - https://doc.akka.io/docs/akka/current/stream/index.htm
• Author email -anatolse@gmail.com

More Related Content

What's hot

Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Apache Apex
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
datamantra
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
datamantra
 
Data Integration
Data IntegrationData Integration
Data Integration
Datio Big Data
 
So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector?
confluent
 
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
confluent
 
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! Japan
StreamNative
 
A Short Presentation on Kafka
A Short Presentation on KafkaA Short Presentation on Kafka
A Short Presentation on Kafka
Mostafa Jubayer Khan
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
datamantra
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changes
confluent
 
Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016
Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016
Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016
Esther Kundin
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Apache Kafka Streams
Apache Kafka StreamsApache Kafka Streams
Apache Kafka Streams
Apache Kafka TLV
 
Cloudera's Flume
Cloudera's FlumeCloudera's Flume
Cloudera's Flume
Cloudera, Inc.
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
Thisara Pramuditha
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Knoldus Inc.
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
Abhinav Singh
 
Archive integration with RDF
Archive integration with RDFArchive integration with RDF
Archive integration with RDF
Lars Marius Garshol
 

What's hot (20)

Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
 
Data Integration
Data IntegrationData Integration
Data Integration
 
So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector?
 
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
 
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! Japan
 
A Short Presentation on Kafka
A Short Presentation on KafkaA Short Presentation on Kafka
A Short Presentation on Kafka
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changes
 
Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016
Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016
Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Apache Kafka Streams
Apache Kafka StreamsApache Kafka Streams
Apache Kafka Streams
 
Cloudera's Flume
Cloudera's FlumeCloudera's Flume
Cloudera's Flume
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
 
Archive integration with RDF
Archive integration with RDFArchive integration with RDF
Archive integration with RDF
 

Similar to Solution for events logging with akka streams and kafka

Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
SATOSHI TAGOMORI
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
Rahul Jain
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Dibyendu Bhattacharya
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast Data
Naveen Korakoppa
 
Apache kafka
Apache kafkaApache kafka
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
Slim Baltagi
 
Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
Steve Loughran
 
Event driven-arch
Event driven-archEvent driven-arch
Event driven-arch
Mohammed Shoaib
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Data Con LA
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
Yoni Farin
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
Claudiu Barbura
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
phanleson
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and Scala
Max Alexejev
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
Jean-Baptiste Onofré
 

Similar to Solution for events logging with akka streams and kafka (20)

Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast Data
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
Event driven-arch
Event driven-archEvent driven-arch
Event driven-arch
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and Scala
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 

Recently uploaded

socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 

Recently uploaded (20)

socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 

Solution for events logging with akka streams and kafka

  • 1. Solution for events logging How does Akka Streams in couple with Kafka make it easier to manage data flows
  • 2. 2Sementsov A., 2020 Architecture: Step 1 • Client communicate with server using Kafka broker • Consumer writes each record in both storages: PostgreSQL and Hadoop (possible Hive) • PostgreSQL used for operational data with maximum storage period of about 1-3 months. It is the storage with fast search capability • Hadoop and related components used as cheaper storage, but with slower access Bird's eye view Challenges 1. It is difficult to maintain consistency when writing to multiple repositories simultaneously 2. Consumer must use existing access rights provider, monitoring and logging system 3. How to reduce the amount of developed code and speed up the process of adding new repositories 4. How to choose the optimal storage format from all the possible options offered by the Hadoop and related products 5. No vendor lock-in allowed
  • 3. 3Sementsov A., 2020 Architecture: Step 2 Finish solution 1. Consistency – Kafka provides functionality to create several groups of consumers. Each group process all messages independently 2. There are many tools such as Fluent, LogStash, Flume provides “get -> process -> put” functionality. However for two main reasons (Refusal to vendor lock-in and integration with our proprietary systems) we decided to develop our own 3. We chose Akka Streams library to reduce the amount of developed code and speed up the process of adding new storages 4. Hadoop write stages 1. Consumer service writes files directly to HDFS in Apache Avro format. Pros Avro - fast write and compression. Cons – it has no indexes, so the search is very slow 2. The Apache Oozie plans a task to convert Avro files into ORC table using Hive. Apache ORC files has several advantages: • Has indexes • Allowed columnar encryption and/or masking Cons • File can be written only once because indexes must be added at the end of file 5. Both PostgreSQL and Hive have JDBC drivers to access data
  • 4. 4Sementsov A., 2020 Akka Streams: Consumer application architecture Consumer application • HTTP endpoints protected by desired access provider. Endpoint developed using Akka Http • HTTP endpoint backed by KafkaProcessor Actor. Actor can process two type of messages – Start and Stop • KafkaProcessor can start & stops several kind of streams: • PostgresFlow • HdfsAvroFlow • Trait BaseKafkaFlow provides common functions: • Logging & monitoring functionality • Get messages from and commit to Kafka • Extension points to parse and store messages • Each of final streams extends BaseKafkaFlow trait and implements flow stages for • Parser • Store
  • 5. 5Sementsov A., 2020 Akka Streams coding: Base classes These two traits – BaseProcessorFlow and BaseKafkaFlow provide functions which cover entire processing procedure. Type parameters In and P allowed to use different sources and process different messages For BaseKafkaFlow source type is KafkaMessage, which defined as BaseProcessorFlow has two abstract values • parse – will be used to convert incoming value into PassThrough[In, P] • saveBatch – will be used to store batch data and transfer all processed data to downstream. This stage can be cause of backpressure BaseKafkaFlow extends base processor by adding ability to receive and send messages. BaseKafkaFlow is an Actor. Also this class have states. BaseKafkaFlow can process the following messages: • StartKafkaConsumer(startConfig) – Message contains parameters to create and start stream connected to Kafka brokers. Stream can read message from Kafka, processing its and committing offset to Kafka partitions when messages processes successfully. Moreover, startConfig contains information about consumer group. • StopKafkaConsumer - Message used to stop all streams gracefully Main application actor can start many actors of different classes derived from BaseKafkaFlow
  • 6. 6Sementsov A., 2020 Akka Streams coding: Storages PostgresFlow implements parser to construct SomeEvent object stored as JSON. If JSON parsing was not success then constructs SomeEvent object which contains error. saveBatch stores data into PostgreSQL database. In the case when the data cannot be saved due to a database error, this function has repeatedly tried to save the data. To avoid CPU overload, it has an ExponentialBackoff to control the time between attempts. The whole batch is saved in one transaction in function saveEvents
  • 7. 7Sementsov A., 2020 Akka Streams coding: Storages To store Avro files we need more complex program code. For this purpose using custom flow processing. HdfsAvroFileFlow defines FlowStage HdfsAvroFileFlowLogic defines processing logic. It is necessary because we need state to store output avro stream. One stream = one file in HDFS. In fact this logic is very simple. Open file is it not open and write file if it open. From time to time, the file is forced to close, rotation occurs writeAvroEvents function writes data and pass successfully written data to the downstream. Due to this action only data which actually written to HDFS will be committed to Kafka However, there is one point to consider. Kafka stream will not reread uncommitted records. To do this we need to restart stream. It can be done either send special message to actor or by throwing Exception during this stage
  • 8. 8Sementsov A., 2020 Links • Akka main site - https://akka.io/ • Akka Streams - https://doc.akka.io/docs/akka/current/stream/index.htm • Author email -anatolse@gmail.com