Lambda Architecture using Apache Spark – with Java code examples

•Download as PPTX, PDF•

1 like•591 views

Lambda architecture, devised by Nathan Marz, is a layered architecture which solves the problem of computing arbitrary functions on arbitrary data in real time. Read More on : http://blogs.quovantis.com/ Visit Site: www.quovantis.com

Technology

Lambda architecture, devised by Nathan Marz, is a layered architecture which solves the
problem of computing arbitrary functions on arbitrary data in real time. In a real time system
the requirement is something like this -
result = function (all data)
With increasing volume of data, the query will take a significant amount of time to execute no
matter what resources we have used.
Lambda Architecture uses three layer architecture and a concept of pre-computed views to
solve this problem. Three layers are
● Batch Layer
● Speed Layer
● Serving Layer

Batch Layer
Batch layer stores immutable master data, computes arbitrary functions on all data and creates batch views.
Function of batch layer can be summarized as
batch view = function (all data)
Batch layer continuously does this job and updates batch views.

Traffic from Social Media
Serving Layer
Purpose of Serving Layer is to store batch views obtained from batch layer and provide random access to batch views.
When batch layer computes new views, they are updated in Serving Layer by Batch Layer.
The Serving Layer can be achieved by using a random access database.
Speed Layer
While batch layer computes batch view, it will not include data which came while re-computing batch views.
The purpose of Speed layer is to compute incremental views on recent data that is not included in batch views.
These views are called real time views.
A Speed Layer can be summarized as
real time view = function (real time view, new data)
So, our final query can be served by speed layer or serving layer.
batch view = function (all data)
real time view = function (real time view, new data)
result = merge (query (batch view), query (real time view))

An Example using Apache Spark
Suppose we want to build a system to find popular hash tags in a twitter stream, we can implement lambda architecture
using Apache Spark to build this system.
Batch Layer Implementation - Batch layer will read a file of tweets and calculate hash tag frequency map and will save
it to Cassandra database table.
Batch.java

Speed Layer Implementation - Speed layer can also be written in Apache spark using spark streaming feature.
We can get a stream of recent tweets and calculate recent real time view from this stream we can also save this
real time view to Cassandra for simplicity.
Speed.java :

Serving Layer implementation - Serving layer can be implemented as a RESTful web service which will query
Cassandra tables to get the final result in real time.

Unique Page Views
References and image credits
http://www.databasetube.com/database/big-data-lambda-architecture/
Big Data Principles and best practices of scalable real time data systems by Nathan Marz and James Warren

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

CloudStudio User manual (basic edition):comworks

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Gen AI in Business - Global Trends Report 2024.pdfAddepto

"ML in Production",Oleksandr BaganFwdays

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2

AI as an Interface for Commercial BuildingsMemoori

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs

What's New in Teams Calling, Meetings and Devices March 2024

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Are Multi-Cloud and Serverless Good or Bad?

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

DMCC Future of Trade Web3 - Special Edition

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Advanced Test Driven-Development @ php[tek] 2024

CloudStudio User manual (basic edition):

DevEX - reference for building teams, processes, and platforms

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost

Unleash Your Potential - Namagunga Girls Coding Club

Designing IA for AI - Information Architecture Conference 2024

Gen AI in Business - Global Trends Report 2024.pdf

"ML in Production",Oleksandr Bagan

My INSURER PTE LTD - Insurtech Innovation Award 2024

The Future of Software Development - Devin AI Innovative Approach.pdf

AI as an Interface for Commercial Buildings

Lambda Architecture using Apache Spark – with Java code examples

2. Lambda Architecture

3. Lambda architecture, devised by Nathan Marz, is a layered architecture which solves the problem of computing arbitrary functions on arbitrary data in real time. In a real time system the requirement is something like this - result = function (all data) With increasing volume of data, the query will take a significant amount of time to execute no matter what resources we have used. Lambda Architecture uses three layer architecture and a concept of pre-computed views to solve this problem. Three layers are ● Batch Layer ● Speed Layer ● Serving Layer

5. Batch Layer Batch layer stores immutable master data, computes arbitrary functions on all data and creates batch views. Function of batch layer can be summarized as batch view = function (all data) Batch layer continuously does this job and updates batch views.

6. Traffic from Social Media Serving Layer Purpose of Serving Layer is to store batch views obtained from batch layer and provide random access to batch views. When batch layer computes new views, they are updated in Serving Layer by Batch Layer. The Serving Layer can be achieved by using a random access database. Speed Layer While batch layer computes batch view, it will not include data which came while re-computing batch views. The purpose of Speed layer is to compute incremental views on recent data that is not included in batch views. These views are called real time views. A Speed Layer can be summarized as real time view = function (real time view, new data) So, our final query can be served by speed layer or serving layer. batch view = function (all data) real time view = function (real time view, new data) result = merge (query (batch view), query (real time view))

8. An Example using Apache Spark Suppose we want to build a system to find popular hash tags in a twitter stream, we can implement lambda architecture using Apache Spark to build this system. Batch Layer Implementation - Batch layer will read a file of tweets and calculate hash tag frequency map and will save it to Cassandra database table. Batch.java

9. Speed Layer Implementation - Speed layer can also be written in Apache spark using spark streaming feature. We can get a stream of recent tweets and calculate recent real time view from this stream we can also save this real time view to Cassandra for simplicity. Speed.java :

10. Serving Layer implementation - Serving layer can be implemented as a RESTful web service which will query Cassandra tables to get the final result in real time.

11. Unique Page Views References and image credits http://www.databasetube.com/database/big-data-lambda-architecture/ Big Data Principles and best practices of scalable real time data systems by Nathan Marz and James Warren

Lambda Architecture using Apache Spark – with Java code examples

Recommended

Recommended

More Related Content

More from Quovantis

More from Quovantis (7)

Recently uploaded

Recently uploaded (20)

Lambda Architecture using Apache Spark – with Java code examples