LAMBDA ARCHITECTURE
SZILVESZTER MOLNAR
LAMBDA ARCHITECTURE
A NEW PARADIGM
▸ 30,000 GB of data / second
▸ Traditional database systems at limit, failed
▸ New breed of technologies
▸ MapReduce
▸ Distributed Key/Value Stores
WEB ANALYTICS
APP
LAMBDA ARCHITECTURE
WEB ANALYTICS APPLICATION
▸ Number of page views for URL
LAMBDA ARCHITECTURE
TRADITIONAL DATABASE
Column Name
id
user_id
url
pageviews
LAMBDA ARCHITECTURE
PROBLEMS START
"Timeout error on inserting to the database"
LAMBDA ARCHITECTURE
PROBLEMS START
WEB
SERVER
QUEUE
Pageview
WORKER
100 at a time
DB
LAMBDA ARCHITECTURE
NEXT STEP - SHARDING THE DATABASE
▸ How to scale write-heavy relational database?
▸ horizontal partitioning
▸ sharding
▸ You migrate data to 4 shards
▸ Your app is getting more & more popular
▸ more & more painful
LAMBDA ARCHITECTURE
EVEN MORE PROBLEMS
▸ Fault-tolerance issues
▸ disks fail often
▸ add read replica
▸ Corruption issues
▸ deploy a bug, notice one day later
LAMBDA ARCHITECTURE
PROBLEMS OF "FULLY INCREMENTAL" ARCHITECTURES
APPLICATION DATABASE
LAMBDA ARCHITECTURE
PROBLEMS OF "FULLY INCREMENTAL" ARCHITECTURES
▸ Operational Complexity
▸ Eventual Consistency
▸ Lack of human-fault tolerance
SURE WE CAN DO
BETTER
LAMBDA ARCHITECTURE
PRINCIPLES
"A data system answers questions based on information that
was acquired in the past up to the present."
Big Data, Nathan Marz
LAMBDA ARCHITECTURE
PRINCIPLES
▸ What is this person's name?
▸ How many friends does this person have?
▸ What is my current balance?
LAMBDA ARCHITECTURE
PRINCIPLES
▸ Not all bits of information are equal
▸ Some are derived from other piece of information
LAMBDA ARCHITECTURE
PRINCIPLES
▸ How many friends does this person have?
▸ friend list changes (add / remove friends)
▸ What is my current balance?
▸ transaction history
LAMBDA ARCHITECTURE
PRINCIPLES
data
LAMBDA ARCHITECTURE
PRINCIPLES
query result = function(all data)
LAMBDA ARCHITECTURE
DESIRED PROPERTIES OF A BIG DATA SYSTEM
▸ Robustness & Fault tolerance
▸ Low latency reads & updates
▸ Scalability
LAMBDA
ARCHITECTURE
LAMBDA ARCHITECTURE
BATCH LAYER
SERVING LAYER
SPEED LAYER
LAMBDA ARCHITECTURE
BATCH LAYER
query result = function(all data)
BATCH LAYER
SERVING LAYER
SPEED LAYER
LAMBDA ARCHITECTURE
BATCH LAYER
batch computed view = function(all data)
query result = function(batch computed view)
BATCH LAYER
SERVING LAYER
SPEED LAYER
LAMBDA ARCHITECTURE
BATCH LAYER
ALL DATA BATCH LAYER
BATCH VIEW
BATCH VIEW
BATCH VIEW
BATCH LAYER
SERVING LAYER
SPEED LAYER
LAMBDA ARCHITECTURE
SERVING LAYER
BATCH LAYER
SERVING LAYER
SPEED LAYER
BATCH LAYER
MASTER DATASET
SERVING LAYER
BATCH VIEW BATCH VIEW BATCH VIEW
QUERY
LAMBDA ARCHITECTURE
DESIRED PROPERTIES OF A BIG DATA SYSTEM
▸ Satisfied
▸ Robustness & Fault tolerance
▸ Scalability
▸ Not Satisfied
▸ Low latency reads & updates
BATCH LAYER
SERVING LAYER
SPEED LAYER
LAMBDA ARCHITECTURE
SPEED LAYER
‣ Batch layer runs for several hours
‣ Looks only at new data
BATCH LAYER
SERVING LAYER
SPEED LAYER
LAMBDA ARCHITECTURE
SPEED LAYER
realtime views = function(realtime views, new data)
BATCH LAYER
SERVING LAYER
SPEED LAYER
LAMBDA ARCHITECTURE
BATCH LAYER, SERVING LAYER, SPEED LAYER
batch view = function(all data)
realtime views = function(realtime views, new data)
query result = function(batch view, realtime view)
BATCH LAYER
SERVING LAYER
SPEED LAYER
LAMBDA ARCHITECTURE
SPEED LAYER
NEW DATA
BATCH LAYER
MASTER DATASET
SERVING LAYER
BATCH VIEW BATCH VIEW BATCH VIEW
REALTIME VIEW
REALTIME VIEW
REALTIME VIEW
QUERY
LAMBDA ARCHITECTURE
LAMBDA ARCHITECTURE
Questions?
LAMBDA ARCHITECTURE
LAMBDA ARCHITECTURE
Thanks!

Lambda architecture