Lambda architecture, devised by Nathan Marz, is a layered architecture which solves the problem of computing arbitrary functions on arbitrary data in real time.
Read More on : http://blogs.quovantis.com/
Visit Site: www.quovantis.com
3. Lambda architecture, devised by Nathan Marz, is a layered architecture which solves the
problem of computing arbitrary functions on arbitrary data in real time. In a real time system
the requirement is something like this -
result = function (all data)
With increasing volume of data, the query will take a significant amount of time to execute no
matter what resources we have used.
Lambda Architecture uses three layer architecture and a concept of pre-computed views to
solve this problem. Three layers are
● Batch Layer
● Speed Layer
● Serving Layer
4.
5. Batch Layer
Batch layer stores immutable master data, computes arbitrary functions on all data and creates batch views.
Function of batch layer can be summarized as
batch view = function (all data)
Batch layer continuously does this job and updates batch views.
6. Traffic from Social Media
Serving Layer
Purpose of Serving Layer is to store batch views obtained from batch layer and provide random access to batch views.
When batch layer computes new views, they are updated in Serving Layer by Batch Layer.
The Serving Layer can be achieved by using a random access database.
Speed Layer
While batch layer computes batch view, it will not include data which came while re-computing batch views.
The purpose of Speed layer is to compute incremental views on recent data that is not included in batch views.
These views are called real time views.
A Speed Layer can be summarized as
real time view = function (real time view, new data)
So, our final query can be served by speed layer or serving layer.
batch view = function (all data)
real time view = function (real time view, new data)
result = merge (query (batch view), query (real time view))
7.
8. An Example using Apache Spark
Suppose we want to build a system to find popular hash tags in a twitter stream, we can implement lambda architecture
using Apache Spark to build this system.
Batch Layer Implementation - Batch layer will read a file of tweets and calculate hash tag frequency map and will save
it to Cassandra database table.
Batch.java
9. Speed Layer Implementation - Speed layer can also be written in Apache spark using spark streaming feature.
We can get a stream of recent tweets and calculate recent real time view from this stream we can also save this
real time view to Cassandra for simplicity.
Speed.java :
10. Serving Layer implementation - Serving layer can be implemented as a RESTful web service which will query
Cassandra tables to get the final result in real time.
11. Unique Page Views
References and image credits
http://www.databasetube.com/database/big-data-lambda-architecture/
Big Data Principles and best practices of scalable real time data systems by Nathan Marz and James Warren