Event Driven Architecture and Apache Kafka were discussed. Key points:
- Event driven systems allow for asynchronous and decoupled communication between services using message queues.
- Apache Kafka is a distributed streaming platform that allows for publishing and subscribing to streams of records across a cluster of servers. It provides reliability through replication and allows for horizontal scaling.
- Kafka provides advantages over traditional queues like decoupling, scalability, and fault tolerance. It also allows for publishing of data and consumption of data independently, unlike traditional APIs.
one of the most interesting project I have ever worked on was a migration project that needed to be handled as a batch process, in this slides we will have an overview of the challenge we had, the choices, why we chosed Spring batch, and have an overview of Spring Batch capabilities, in less than 15 minutes.
Reactive programming is a general programming term focused on reacting to changes, such as data values or events. It can and often is done imperatively. A callback, delegate is an approach to reactive programming done imperatively.
one of the most interesting project I have ever worked on was a migration project that needed to be handled as a batch process, in this slides we will have an overview of the challenge we had, the choices, why we chosed Spring batch, and have an overview of Spring Batch capabilities, in less than 15 minutes.
Reactive programming is a general programming term focused on reacting to changes, such as data values or events. It can and often is done imperatively. A callback, delegate is an approach to reactive programming done imperatively.
We will introduce Airflow, an Apache Project for scheduling and workflow orchestration. We will discuss use cases, applicability and how best to use Airflow, mainly in the context of building data engineering pipelines. We have been running Airflow in production for about 2 years, we will also go over some learnings, best practices and some tools we have built around it.
Speakers: Robert Sanders, Shekhar Vemuri
An introduction to reactive programming concepts and basics. I aim here to show what's reactive programming, why it's used and show some frameworks and benchmarks that support it.
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...Flink Forward
The Machine Learning (ML) library for Flink is a new effort to bring scalable ML tools to the Flink community in order to make Flink a premier platform for ML. This effort covers a wide area of ML related activities including data analysis, model training and model serving. In this talk we will cover approaches to using Flink for model serving. We will present a framework and actual code for serving and updating models in real time. Although presented examples cover only two model types - Tensorflow and PMML, the framework is applicable for any type of externalizable models. We will also show how this implementation fits into overall ML learning implementation and demonstrate integration with SparkML and Tensorflow/Keras and specifically show how models can be exported from these systems. This initial implementation can be further enhanced to support additional model representations, for example PFA and introduction of model serving specific semantics into Flink framework.
How I learned to time travel, or, data pipelining and scheduling with AirflowLaura Lorenz
****UPDATE: Project is now open sourced at https://www.github.com/industrydive/fileflow****
From Pydata DC 2016
Description
Data warehousing and analytics projects can, like ours, start out small - and fragile. With an organically growing mess of scripts glued together and triggered by cron jobs hiding on different servers, we needed better plumbing. After perusing the data pipelining landscape, we landed on Airflow, an Apache incubating batch processing pipelining and scheduler tool from Airbnb.
Abstract
The power of any reporting tool breaks based on the data behind it, so when our data warehousing process got too big for its humble origins, we searched for something better. After testing out several options such as Drake, Pydoit, Luigi, AWS Data Pipeline, and Pinball, we landed on Airflow, an Apache incubating batch processing pipelining and scheduler tool originating from Airbnb, that provides the benefits of pipeline construction as directed acyclic graphs (DAGs), along with a scheduler that can handle alerting, retries, callbacks and more to make your pipeline robust. This talk will discuss the value of DAG based pipelines for data processing workflows, highlight useful features in all of the pipelining projects we tested, and dive into some of the specific challenges (like time travel) and successes (like time travel!) we’ve experienced using Airflow to productionize our data engineering tasks. By the end of this talk, you will learn
- pros and cons of several Python-based/Python-supporting data pipelining libraries
- the design paradigm behind Airflow, an Apache incubating data pipelining and scheduling service, and what it is good for
- some epic fails to avoid and some epic wins to emulate from our experience porting our data engineering tasks to a more robust system
- some quick-start tips for implementing Airflow at your organization.
In Data Engineer’s Lunch #41: Pygrametl , we discussed PygramETL, a python ETL tool in order to close out our series on them.
Accompanying Blog: https://blog.anant.us/data-engineers-lunch-41-pygrametl
Accompanying YouTube: https://youtu.be/YiPuJyYLXxs
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday:
https://www.meetup.com/Data-Wranglers-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
Reactive programming is quite a popular topic these days. For a long time, reactive programming was constrained to interactive user interface designs. With the advancement of hardware (multi-core CPU’s) and the internet, the scale, complexity, and responsiveness of software began to rise which led to reactive programming being regarded as a major programming paradigm.
Read more from here: https://blog.lftechnology.com/introduction-to-reactive-programming-part-1-5b7c63685586
By: Subash Poudel (Software Engineer @ Leapfrog Technology, Inc.)
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Till Rohrmann
How to scale recommendations to extremely large scale using Apache Flink. We use matrix factorization to calculate a latent factor model which can be used for collaborative filtering. The implemented alternating least squares algorithm is able to deal with data sizes on the scale of Netflix.
We will introduce Airflow, an Apache Project for scheduling and workflow orchestration. We will discuss use cases, applicability and how best to use Airflow, mainly in the context of building data engineering pipelines. We have been running Airflow in production for about 2 years, we will also go over some learnings, best practices and some tools we have built around it.
Speakers: Robert Sanders, Shekhar Vemuri
An introduction to reactive programming concepts and basics. I aim here to show what's reactive programming, why it's used and show some frameworks and benchmarks that support it.
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...Flink Forward
The Machine Learning (ML) library for Flink is a new effort to bring scalable ML tools to the Flink community in order to make Flink a premier platform for ML. This effort covers a wide area of ML related activities including data analysis, model training and model serving. In this talk we will cover approaches to using Flink for model serving. We will present a framework and actual code for serving and updating models in real time. Although presented examples cover only two model types - Tensorflow and PMML, the framework is applicable for any type of externalizable models. We will also show how this implementation fits into overall ML learning implementation and demonstrate integration with SparkML and Tensorflow/Keras and specifically show how models can be exported from these systems. This initial implementation can be further enhanced to support additional model representations, for example PFA and introduction of model serving specific semantics into Flink framework.
How I learned to time travel, or, data pipelining and scheduling with AirflowLaura Lorenz
****UPDATE: Project is now open sourced at https://www.github.com/industrydive/fileflow****
From Pydata DC 2016
Description
Data warehousing and analytics projects can, like ours, start out small - and fragile. With an organically growing mess of scripts glued together and triggered by cron jobs hiding on different servers, we needed better plumbing. After perusing the data pipelining landscape, we landed on Airflow, an Apache incubating batch processing pipelining and scheduler tool from Airbnb.
Abstract
The power of any reporting tool breaks based on the data behind it, so when our data warehousing process got too big for its humble origins, we searched for something better. After testing out several options such as Drake, Pydoit, Luigi, AWS Data Pipeline, and Pinball, we landed on Airflow, an Apache incubating batch processing pipelining and scheduler tool originating from Airbnb, that provides the benefits of pipeline construction as directed acyclic graphs (DAGs), along with a scheduler that can handle alerting, retries, callbacks and more to make your pipeline robust. This talk will discuss the value of DAG based pipelines for data processing workflows, highlight useful features in all of the pipelining projects we tested, and dive into some of the specific challenges (like time travel) and successes (like time travel!) we’ve experienced using Airflow to productionize our data engineering tasks. By the end of this talk, you will learn
- pros and cons of several Python-based/Python-supporting data pipelining libraries
- the design paradigm behind Airflow, an Apache incubating data pipelining and scheduling service, and what it is good for
- some epic fails to avoid and some epic wins to emulate from our experience porting our data engineering tasks to a more robust system
- some quick-start tips for implementing Airflow at your organization.
In Data Engineer’s Lunch #41: Pygrametl , we discussed PygramETL, a python ETL tool in order to close out our series on them.
Accompanying Blog: https://blog.anant.us/data-engineers-lunch-41-pygrametl
Accompanying YouTube: https://youtu.be/YiPuJyYLXxs
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday:
https://www.meetup.com/Data-Wranglers-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
Reactive programming is quite a popular topic these days. For a long time, reactive programming was constrained to interactive user interface designs. With the advancement of hardware (multi-core CPU’s) and the internet, the scale, complexity, and responsiveness of software began to rise which led to reactive programming being regarded as a major programming paradigm.
Read more from here: https://blog.lftechnology.com/introduction-to-reactive-programming-part-1-5b7c63685586
By: Subash Poudel (Software Engineer @ Leapfrog Technology, Inc.)
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Till Rohrmann
How to scale recommendations to extremely large scale using Apache Flink. We use matrix factorization to calculate a latent factor model which can be used for collaborative filtering. The implemented alternating least squares algorithm is able to deal with data sizes on the scale of Netflix.
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
Fundamentals and Architecture of Apache Kafka.
This presentation explains Apache Kafka's architecture and internal design giving an overview of Kafka internal functions, including:
Brokers, Replication, Partitions, Producers, Consumers, Commit log, comparison over traditional message queues.
Unleashing Real-time Power with Kafka.pptxKnoldus Inc.
Unlock the potential of real-time data streaming with Kafka in this session. Learn the fundamentals, architecture, and seamless integration with Scala, empowering you to elevate your data processing capabilities. Perfect for developers at all levels, this hands-on experience will equip you to harness the power of real-time data streams effectively.
Building Cloud-Native App Series - Part 2 of 11
Microservices Architecture Series
Event Sourcing & CQRS,
Kafka, Rabbit MQ
Case Studies (E-Commerce App, Movie Streaming, Ticket Booking, Restaurant, Hospital Management)
Introduction to Kafka Streams PresentationKnoldus Inc.
Kafka Streams is a client library providing organizations with a particularly efficient framework for processing streaming data. It offers a streamlined method for creating applications and microservices that must process data in real-time to be effective. Using the Streams API within Apache Kafka, the solution fundamentally transforms input Kafka topics into output Kafka topics. The benefits are important: Kafka Streams pairs the ease of utilizing standard Java and Scala application code on the client end with the strength of Kafka’s robust server-side cluster architecture.
In the following slides, we are trying to explore Kafka and Event-Driven Architecture. We try to define what is Kafka platform, how does it work, analyze Kafka API's like ConsumerAPI, ProducerAPI, StreamsAPI. Also we take a look on some core Kafka's configuration before we deploy it on production and we discuss a few best approaches to have a reliable data delivery system using Kafka.
Check out our repository: https://github.com/arconsis/Eshop-EDA
In the following slides, our dear colleagues Dimosthenis Botsaris and Alexandros Koufatzis are trying to explore Kafka and Event-Driven Architecture. They define what is the Kafka platform, how does it work and analyze Kafka API's like ConsumerAPI, ProducerAPI, StreamsAPI. They also take a look on some core Kafka's configuration before they deploy it on production and discuss a few best approaches to have a reliable data delivery system using Kafka.
Check out the repository: https://github.com/arconsis/Eshop-EDA
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...StreamNative
More and more developer want to build cloud-native distributed application or microservices by making use of high performing, cloud-agnostic messaging technology for maximum decoupling. The only thing we do not want is the hassle of managing the complex message infrasturcture needed for the job, or the risk of getting into a vendor lock-in. Generally developers know Apache Kafka, but for event sourcing or the CQRS pattern Kafka is not really suitable. In this talk I will give you at least ten reasons why to choose Pulsar over Kafka for event sourcing and data consensus.
Uber has one of the largest Kafka deployment in the industry. To improve the scalability and availability, we developed and deployed a novel federated Kafka cluster setup which hides the cluster details from producers/consumers. Users do not need to know which cluster a topic resides and the clients view a "logical cluster". The federation layer will map the clients to the actual physical clusters, and keep the location of the physical cluster transparent from the user. Cluster federation brings us several benefits to support our business growth and ease our daily operation. In particular, Client control. Inside Uber there are a large of applications and clients on Kafka, and it's challenging to migrate a topic with live consumers between clusters. Coordinations with the users are usually needed to shift their traffic to the migrated cluster. Cluster federation enables much control of the clients from the server side by enabling consumer traffic redirection to another physical cluster without restarting the application. Scalability: With federation, the Kafka service can horizontally scale by adding more clusters when a cluster is full. The topics can freely migrate to a new cluster without notifying the users or restarting the clients. Moreover, no matter how many physical clusters we manage per topic type, from the user perspective, they view only one logical cluster. Availability: With a topic replicated to at least two clusters we can tolerate a single cluster failure by redirecting the clients to the secondary cluster without performing a region-failover. This also provides much freedom and alleviates the risks for us to carry out important maintenance on a critical cluster. Before the maintenance, we mark the cluster as a secondary and migrate off the live traffic and consumers. We will present the details of the architecture and several interesting technical challenges we overcame.
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpJosé Román Martín Gil
Apache Kafka is the most used data streaming broker by companies. It could manage millions of messages easily and it is the base of many architectures based in events, micro-services, orchestration, ... and now cloud environments. OpenShift is the most extended Platform as a Service (PaaS). It is based in Kubernetes and it helps the companies to deploy easily any kind of workload in a cloud environment. Thanks many of its features it is the base for many architectures based in stateless applications to build new Cloud Native Applications. Strimzi is an open source community that implements a set of Kubernetes Operators to help you to manage and deploy Apache Kafka brokers in OpenShift environments.
These slides will introduce you Strimzi as a new component on OpenShift to manage your Apache Kafka clusters.
Slides used at OpenShift Meetup Spain:
- https://www.meetup.com/es-ES/openshift_spain/events/261284764/
How to use kakfa for storing intermediate data and use it as a pub/sub model with each of the Producer/Consumer/Topic configs deeply and the Internals working of it.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
2. Agenda
● Introduction to Event Driven Systems
● Introduction to different problems that arise when using Event Driven Systems
● Different patterns
● Apache Kafka
● Org-wide implementations
3. Lets Build Scamazon
Disclaimer: The session will have stupid poor jokes scattered all over. Forgive the incompetent joker in me.
4. Developer report
My code provisions the order
quickly. While investigating, I
observed that sending the order
successful mail takes a lot of
time.
Customer report
Placing an order on
Scamazon is such a pain.
It takes too long.
Latency: Amount of time it takes to complete one request
Throughput: Amount of requests that can be completed in a unit of time
6. Scamazon Customer report
I placed an order on the last
samsong smart watch before
my friend yesterday. But he got
the watch and I didnt.
Annoying Business manager
What? How can that happen?
Devs, fix it.
7. Transaction Ordering
1. Have a Distributed Locking
mechanism
2. Ensure Message queues
inherently supports ordering
8. Customer report
On your Great Scamazon
sale, I purchased a shoe. I
never got it.
Apparently Microservices
are in trend now!
Business Manager
Each customer we lose, is
one gained for FlopKart! Jeff
Bozo is going to kill us. Devs,
Do Something !!
Developer
Delivery Service was down
because some SysAd
upgraded something.
Blame them! My code is gold.
11. Message queues
1. Message queueing allows applications to communicate
asynchronously by sending messages to each other.
2. Messages may or may not be sent in FIFO
12. Message queues v/s APIs
APIs are 2-way communication.
Message Queues are 1-way communication.
13. Some theory
● Messages: Messages can be commands or events. Each message can have a type
and payload data to be sent across.
● Channels/Queues: Channels/Queues are delivery points the messages are sent to.
● Event Dispatcher/Producers: The duty of the dispatcher is to register channels and
produce events/messages.
● Event Handlers/Consumers: Event handlers act as destination points for receiving
events as channels do.
● Dynamic Routers: The harmony of messaging systems occurs through its routers.
Routers are responsible for selecting the proper path for a given message.
14. Review Message Queue Advantages
1. Decoupled Architecture: The sender’s job is to send messages to the queue
and the receiver’s job is to receive messages from the queue. This separation
allows for components to operate independently, while still communicating
with each other.
1. Fault tolerant: If the consumer server is down, the messages would still be
stored in queue until the server goes back up and retrieves it
2. Scalability: We can push all the messages in the queue and configure the rate
at which the consumer server pulls from it. This way, the receiver won’t get
overloaded with network requests and can instead maintain a steady stream
of data to process.
3. Transaction Ordering: Have patterns to Ensure messages are delivered in
order of they getting published if the use-case needs it
18. Event Sourcing
Events may or may not be stored in
an EventStore(ex: broker).
4
Designed for Fast & low latency
applications(ex: stock trading)
1
Capture all changes to an application
state as a sequence of events.
2
Events define the state of the
system.
3
Like git, it doesn't store the latest
version of code, but maintains the
changes from the base version.
5
If server goes down, we can replay
the events and get the state back.
6
19. Event sourcing Advantages/ Disadvantages
Advantages
1. Entire applications can be in
memory
2. Parallel model
Disadvantages
1. Event schemas
2. Non event sourcing service
interaction
20.
21. Apache Kafka
● Developed by LinkedIn.
● Building real-time streaming data pipelines that reliably
get data between systems or applications
● Building real-time streaming applications that transform
or react to the streams of data
1. Channels a.k.a Topics
2. Brokers a.k.a Routers
Distributed Streaming Platform
22. Records in Kafka
Each Record(Message) in kafka has:
1. Key
2. Value
3. Timestamp
Key
Value
Timestamp
null
{“orderid”:1,
“cid”: 5}
1322468906767
8461
Abhishek
1592468905404
resellerhosting
{“cid”: 5,
“addons”: []}
1492468905404
Examples:
23. Topic
1. Partition: Ordered, immutable
sequence of records that is
continually appended to
2. Offset: Uniquely identifies each
record within the partition
24. How is kafka resilient ?
1. Cluster of kafka servers
2. Partitions are replicated in all
3. Each Server function as leader for a fair share
of partitions
4. Followers passively replicate leader.
5. Each kafka server will have leaders and
followers.
P1
Broker 1
P1 P2 P3
P2 P3
Broker 2
25. Can you Explain?
Scenario 1:
Brokers: 3
Partitions: 3
Replication Factor: 2
Scenario 2:
Brokers: 3
Partitions: 2
Replication Factor: 3
Broker 1
P
1
P
3
Broker 2
P
2
P
1
Broker 3
P
3
P
2
Broker 1
P
1
P
2
Broker 2
P
2
P
1
P
1
Broker 3
P
2
26. APIs provided by Kafka
1. Producer: publish messages to one or more topics
2. Consumer: subscribe to a topic and process these messages
3. Connect API: Bind existing systems like databases to kafka topics
4. Streams API: Consume a stream of data and produce one or more output streams.
5. Admin API: manage topics, broker and other kafka objects
28. Kafka Producers
● Producers publish records to the kafka topics with help of kafka provided kafka APIs
● Any number of producer can public records to the same topic
● Producers can decide the partition of the topic into which record to publish
○ For normal use case and better load balancing they follow round-robin allocation
● Batch processing on producer side. Why and at What cost?
● How records are routed from producer to the leader of target partition?
29. Why Zookeeper?
● Zookeeper can provide primitives to support:
○ Distributed Configuration Service
○ Synchronization
○ Cluster Management
○ Service Discovery
● Highly optimized for reads than writes.
● How kafka utilizes Zookeeper?
○ Controller Election
○ Configuration of Topics
○ Membership of the cluster
Coordination Service for Distributed Applications
31. Why is Kafka not a simple queue ?
1. durably persists all published
records—whether or not they have
been consumed—using a
configurable retention period
2. This offset is controlled by the
consumer
3. Consumer can consume however it
wants.
4. Consumer can reset to an older
offset to reprocess data or skip
ahead.
5. Kafka provides feature of both
Queuing and Pub-Sub model with its
partitioned topic based design.
32. Ordering of Records in Kafka
● Total ordering over records within partition
● How to achieve total ordering over all data?
● Why kafka decided to provide ordering over
only partition?
Kafka Provides better ordering than Queues!!?
34. Example of Enterprise Messaging in Endurance
BLL implemented event driven architecture using Kafka.
1
RP uses ActiveMQ’s Queues and Topics for messaging,
notifications etc.
2
CA uses it for CQRS via ActiveMQ.
4
OrderBox uses Kafka for Streaming changes from databases.
3
38. Why Kafka for Data Streaming(why not ActiveMQ) ?
● Plethora of connectors to connect to multiple data sources.
● Inherent architecture promotes its use as a Log store.
● Kafka Streams provides an amazing set of API’s to provide aggregation features.
● This format of streaming is abbreviated as Change Data Capture
○ More details can be found here
39. RP use cases
1. Message from Core Web layer to
Executor Layer(Queue)
2. Cache Eviction across containers of
same service(Topic)
3. Cache eviction across different
services(Topic)
4. Can use Event Carried State
Transfer for cache updation(Topic)