Real-time Big Data Architectures
By
Dr. Sanjeev Kumar
Professor
Department of Computer Application
Contents
• Introduction
• The Lambda Architecture
• Pros and Cons of Lambda Architecture
• The Kappa Architecture
• Pros and Cons of The Kappa Architecture
• Conclusion
• References
2
Real-time Big Data Architectures, Dr. Sanjeev Kumar
Introduction of Big Data
First we are talking about big data, we also expect to push the limits on
volume, velocity and possibly even variety of data.
Real-time data processing often requires qualities such as scalability,
fault-tolerant, predictability, resiliency against stream imperfections,
and must be extensible.
Real-time Big Data Architectures, Dr. Sanjeev Kumar
Architectures of Real-Timing Big Data
The Lambda Architecture
Real-time Big Data Architectures, Dr. Sanjeev Kumar
• The batch layer stores the raw data as it arrives, and computes the
batch views for consumption. Naturally, batch processes will occur on
some interval and will be long-lived. The scope of data is anywhere
from hours to years.
• The speed layer is used to compute the real-time views to compliment
the batch views.
The data stream entering the system is dual fed into both a
batch and speed layer
Real-time Big Data Architectures, Dr. Sanjeev Kumar
Pros and Cons of Lambda Architecture
Pros:
• Batch layer of Lambda architecture manages historical data with the fault
tolerant distributed storage which ensures low possibility of errors even if
the system crashes.
• It is a good balance of speed and reliability.
• Fault tolerant and scalable architecture for data processing.
Real-time Big Data Architectures, Dr. Sanjeev Kumar
Pros and Cons of Lambda Architecture
Cons:
• It can result in coding overhead due to involvement of comprehensive
processing.
• Re-processes every batch cycle which is not beneficial in certain
scenarios.
• A data modeled with Lambda architecture is difficult to migrate or
reorganize.
Real-time Big Data Architectures, Dr. Sanjeev Kumar
• It focuses on only processing data as a stream. It is not a replacement
for the Lambda Architecture, except for where your use case fits.
• The idea is to handle both real-time data processing and continuous
reprocessing in a single stream processing engine
The Kappa Architecture
Real-time Big Data Architectures, Dr. Sanjeev Kumar
• Multiple data events or queries are logged in a queue to be catered
against a distributed file system storage or history.
• The order of the events and queries is not predetermined. Stream
processing platforms can interact with database at any time.
Kappa architecture can be deployed for those data
processing enterprise models where:
Real-time Big Data Architectures, Dr. Sanjeev Kumar
Pros and Cons of Kappa Architecture
Pros:
• Kappa architecture can be used to develop data systems that are online
learners and therefore don’t need the batch layer.
• Re-processing is required only when the code changes.
• It can be deployed with fixed memory.
• It can be used for horizontally scalable systems.
• Fewer resources are required as the machine learning is being done on the
real time basis.
Real-time Big Data Architectures, Dr. Sanjeev Kumar
Pros and Cons of Kaapa Architecture
Cons:
• Absence of batch layer might result in errors during data processing or
while updating the database that requires having an exception manager
to reprocess the data or reconciliation.
Real-time Big Data Architectures, Dr. Sanjeev Kumar
Conclusion:
If you seek you’re an architecture that is more reliable in updating the
data lake as well as efficient in devising the machine learning models to
predict upcoming events in a robust manner you should use the
Lambda Architecture as it reaps the benefits of batch layer and speed
layer to ensure less errors and speed.
On the other hand if you want to deploy big data architecture by using
less expensive hardware and require it to deal effectively on the basis
of unique events occurring on the runtime then select the Kappa
architecture for your real-time data processing needs.
Real-time Big Data Architectures, Dr. Sanjeev Kumar
Thanks

Real time architecture big data

  • 1.
    Real-time Big DataArchitectures By Dr. Sanjeev Kumar Professor Department of Computer Application
  • 2.
    Contents • Introduction • TheLambda Architecture • Pros and Cons of Lambda Architecture • The Kappa Architecture • Pros and Cons of The Kappa Architecture • Conclusion • References 2 Real-time Big Data Architectures, Dr. Sanjeev Kumar
  • 3.
    Introduction of BigData First we are talking about big data, we also expect to push the limits on volume, velocity and possibly even variety of data. Real-time data processing often requires qualities such as scalability, fault-tolerant, predictability, resiliency against stream imperfections, and must be extensible. Real-time Big Data Architectures, Dr. Sanjeev Kumar
  • 4.
    Architectures of Real-TimingBig Data The Lambda Architecture Real-time Big Data Architectures, Dr. Sanjeev Kumar
  • 5.
    • The batchlayer stores the raw data as it arrives, and computes the batch views for consumption. Naturally, batch processes will occur on some interval and will be long-lived. The scope of data is anywhere from hours to years. • The speed layer is used to compute the real-time views to compliment the batch views. The data stream entering the system is dual fed into both a batch and speed layer Real-time Big Data Architectures, Dr. Sanjeev Kumar
  • 6.
    Pros and Consof Lambda Architecture Pros: • Batch layer of Lambda architecture manages historical data with the fault tolerant distributed storage which ensures low possibility of errors even if the system crashes. • It is a good balance of speed and reliability. • Fault tolerant and scalable architecture for data processing. Real-time Big Data Architectures, Dr. Sanjeev Kumar
  • 7.
    Pros and Consof Lambda Architecture Cons: • It can result in coding overhead due to involvement of comprehensive processing. • Re-processes every batch cycle which is not beneficial in certain scenarios. • A data modeled with Lambda architecture is difficult to migrate or reorganize. Real-time Big Data Architectures, Dr. Sanjeev Kumar
  • 8.
    • It focuseson only processing data as a stream. It is not a replacement for the Lambda Architecture, except for where your use case fits. • The idea is to handle both real-time data processing and continuous reprocessing in a single stream processing engine The Kappa Architecture Real-time Big Data Architectures, Dr. Sanjeev Kumar
  • 9.
    • Multiple dataevents or queries are logged in a queue to be catered against a distributed file system storage or history. • The order of the events and queries is not predetermined. Stream processing platforms can interact with database at any time. Kappa architecture can be deployed for those data processing enterprise models where: Real-time Big Data Architectures, Dr. Sanjeev Kumar
  • 10.
    Pros and Consof Kappa Architecture Pros: • Kappa architecture can be used to develop data systems that are online learners and therefore don’t need the batch layer. • Re-processing is required only when the code changes. • It can be deployed with fixed memory. • It can be used for horizontally scalable systems. • Fewer resources are required as the machine learning is being done on the real time basis. Real-time Big Data Architectures, Dr. Sanjeev Kumar
  • 11.
    Pros and Consof Kaapa Architecture Cons: • Absence of batch layer might result in errors during data processing or while updating the database that requires having an exception manager to reprocess the data or reconciliation. Real-time Big Data Architectures, Dr. Sanjeev Kumar
  • 12.
    Conclusion: If you seekyou’re an architecture that is more reliable in updating the data lake as well as efficient in devising the machine learning models to predict upcoming events in a robust manner you should use the Lambda Architecture as it reaps the benefits of batch layer and speed layer to ensure less errors and speed. On the other hand if you want to deploy big data architecture by using less expensive hardware and require it to deal effectively on the basis of unique events occurring on the runtime then select the Kappa architecture for your real-time data processing needs. Real-time Big Data Architectures, Dr. Sanjeev Kumar
  • 13.