LAMBDA ARCHITECTURE
Speed layer: Real-time views
BIS2013
Database Systems
Prof. Duc Khanh Tran, PhD
1st Hai Vo
2nd Huy Huynh
3rd Tin Ho
BIS2013-RGB GROUP 2
Hai Vo The Speed Layer
Computing & Storing real-time views
Huy Huynh Challenges of incremental algorithms
Asynchronous versus synchronous updates
Tin Ho Expiring real-time views
Cassandra: an example speed layer view
Conclusion
Agenda
Motivation
The update latency requirements vary a
great deal between applications.
Some applications require updates to propagate
immediately, while in other applications a
latency of a few hours is fine.
BIS2013-RGB GROUP 3
Serving Layer
Motivation
BIS2013-RGB GROUP 4
NEW DATA
Master Data
Batch Layer
Batch
View
Batch
View
Batch
View
Robust & fault-tolerant
Low latency
reads
Allows ad hoc queries
Low latency
update
Speed Layer
Real - Time
View
Real - Time
View
Speed Layer vs Batch Layer
The speed layer is similar to the batch layer in
that it produces views based on data it receives.
Incremental updates as opposed to
recomputation updates.
Only produces views on recent data vs produces
views on the entire dataset.
BIS2013-RGB GROUP 5
Speed Layer
Processing data on a smaller scale
Requires databases that support random
reads and random writes
More complex and thus more prone to error.
The speed layer views are transient.
BIS2013-RGB GROUP 6
Two major facets of the speed layer
Storing the real-time views and processing
the incoming data stream so as to update
those views
BIS2013-RGB GROUP 7
Computing Real-time Views
BIS2013-RGB GROUP 8
Real-time View = function (recent data)
ALL DATA Recent Data
Batch Views Real-time Views
Computing Real-time Views
BIS2013-RGB GROUP 9
Real-time View = function (new data, previous real-time view)
ALL DATA Recent Data
Batch Views Real-time Views
Real-time Views
(Updated)
Storing Real-time Views
Low-latency random reads, and using
incremental algorithms necessitates low-latency
random updates
BIS2013-RGB GROUP 10
Random reads Fault-tolerantRandom writes Scalability
NoSQL database i.e. Cassandra
Challenges of incremental
computation
Incremental algorithms are less general and less
human fault-tolerant, but higher performance
Challenge in a real-time context: the
interaction between incremental algorithms
and the CAP theorem
BIS2013-RGB GROUP 11
The CAP Theorem
BIS2013-RGB GROUP 12
Consistency
Partition
Tolerance
Availability
“You can have at most two of Consistency,
Availability, and Partition-tolerance"
“When a distributed data
system is Partitioned, it can be Consistent or Available
but not both"
the system continues to
operate despite arbitrary
message loss or failure of
part of the system
every request receives a
response
all nodes see the same
data at the same time
C vs A
When you choose consistency, sometimes a
query will receive an error instead of an answer
When you choose availability, at best you will
have where eventual consistency
BIS2013-RGB GROUP 13
Replication in Distributed system
Sally New York
BIS2013-RGB GROUP 14
Joe Paris
Maria Tokyo
Sally New York
Maria Tokyo
Sally New York
Maria Tokyo
Joe Paris
Maria Tokyo
Eventual consistency
BIS2013-RGB GROUP 15
BIS2013-RGB GROUP 16
Eventual consistency counting
Complexity in a real-time eventually consistent
context
Read and repair algorithms are needed
BIS2013-RGB GROUP 17
Interaction between the CAP theorem
and incremental algorithms
Unfortunately there is no escape from this
complexity if you want eventual consistency in
the speed layer
If the real-time view becomes corrupted, the
batch/serving layers will later automatically
correct the mistake in the serving layer views
BIS2013-RGB GROUP 18
Keep calm with Lambda
Synchronous updates
BIS2013-RGB GROUP 19
The application issues a request directly to the
database and blocks until the update is
processed
Asynchronous updates
BIS2013-RGB GROUP 20
Asynchronous update requests are placed in
a queue with the updates occurring at a later
time
Asynchronous versus synchronous
updates
Synchronous Asynchronous
Fast Slower
Coordinate the update with
other aspects of the application
Not coordinate
May overload the database Not overload the database
BIS2013-RGB GROUP 21
Expiring real-time views
Since the simpler batch and serving layers
continuously override the speed layer, the
speed layer views only need to represent data
yet to be processed by the batch computation
workflow
BIS2013-RGB GROUP 22
Expiring real-time views
Instead, we present a generic approach for
expiring speed layer views that works
regardless of the speed layer databases
being used. To understand this approach, let's
first get an understanding of what exactly needs
to be expired each time the serving layer is
updated
BIS2013-RGB GROUP 23
Expiring real-time views
BIS2013-RGB GROUP 24
Expiring real-time views
BIS2013-RGB GROUP 25
Expiring real-time views
BIS2013-RGB GROUP 26
Expiring real-time views
BIS2013-RGB GROUP 27
Cassandra's data model
What do you have ever heard about it?
Column oriented/family database
Key value based database
NOSql database
Real time operational data store
Distributed database
...
Cassandra's data model
Some definitions:
Key space: is the outermost container for data in
Cassandra.
Column families: is a container for a collection of
rows. Each row contains ordered columns.
Columns: is the most basic unit of data in
Cassandra data model. Consists of
<key,value,timestamp> triplet.
Cassandra's data model
Column family example:
Cassandra's data model
Cassandra features make it suitable for real-time
view:
Partitions keys among nodes in cluster:
RandomPatitioner and OrderPreservingPatitioner
Composite columns.
Using Cassandra!
Conclusion
Theoretical model of the speed layer
CAP theorem
Incremental computation in stead of batch
computation
Store speed layer view (real-time)
References
BIS2013-RGB GROUP 34
Big Data, Principles and best practices of
scalable real-time data system – Nathan Marz

Speed layer : Real time views in LAMBDA architecture

  • 1.
    LAMBDA ARCHITECTURE Speed layer:Real-time views BIS2013 Database Systems Prof. Duc Khanh Tran, PhD 1st Hai Vo 2nd Huy Huynh 3rd Tin Ho
  • 2.
    BIS2013-RGB GROUP 2 HaiVo The Speed Layer Computing & Storing real-time views Huy Huynh Challenges of incremental algorithms Asynchronous versus synchronous updates Tin Ho Expiring real-time views Cassandra: an example speed layer view Conclusion Agenda
  • 3.
    Motivation The update latencyrequirements vary a great deal between applications. Some applications require updates to propagate immediately, while in other applications a latency of a few hours is fine. BIS2013-RGB GROUP 3
  • 4.
    Serving Layer Motivation BIS2013-RGB GROUP4 NEW DATA Master Data Batch Layer Batch View Batch View Batch View Robust & fault-tolerant Low latency reads Allows ad hoc queries Low latency update Speed Layer Real - Time View Real - Time View
  • 5.
    Speed Layer vsBatch Layer The speed layer is similar to the batch layer in that it produces views based on data it receives. Incremental updates as opposed to recomputation updates. Only produces views on recent data vs produces views on the entire dataset. BIS2013-RGB GROUP 5
  • 6.
    Speed Layer Processing dataon a smaller scale Requires databases that support random reads and random writes More complex and thus more prone to error. The speed layer views are transient. BIS2013-RGB GROUP 6
  • 7.
    Two major facetsof the speed layer Storing the real-time views and processing the incoming data stream so as to update those views BIS2013-RGB GROUP 7
  • 8.
    Computing Real-time Views BIS2013-RGBGROUP 8 Real-time View = function (recent data) ALL DATA Recent Data Batch Views Real-time Views
  • 9.
    Computing Real-time Views BIS2013-RGBGROUP 9 Real-time View = function (new data, previous real-time view) ALL DATA Recent Data Batch Views Real-time Views Real-time Views (Updated)
  • 10.
    Storing Real-time Views Low-latencyrandom reads, and using incremental algorithms necessitates low-latency random updates BIS2013-RGB GROUP 10 Random reads Fault-tolerantRandom writes Scalability NoSQL database i.e. Cassandra
  • 11.
    Challenges of incremental computation Incrementalalgorithms are less general and less human fault-tolerant, but higher performance Challenge in a real-time context: the interaction between incremental algorithms and the CAP theorem BIS2013-RGB GROUP 11
  • 12.
    The CAP Theorem BIS2013-RGBGROUP 12 Consistency Partition Tolerance Availability “You can have at most two of Consistency, Availability, and Partition-tolerance" “When a distributed data system is Partitioned, it can be Consistent or Available but not both" the system continues to operate despite arbitrary message loss or failure of part of the system every request receives a response all nodes see the same data at the same time
  • 13.
    C vs A Whenyou choose consistency, sometimes a query will receive an error instead of an answer When you choose availability, at best you will have where eventual consistency BIS2013-RGB GROUP 13
  • 14.
    Replication in Distributedsystem Sally New York BIS2013-RGB GROUP 14 Joe Paris Maria Tokyo Sally New York Maria Tokyo Sally New York Maria Tokyo Joe Paris Maria Tokyo
  • 15.
  • 16.
    BIS2013-RGB GROUP 16 Eventualconsistency counting
  • 17.
    Complexity in areal-time eventually consistent context Read and repair algorithms are needed BIS2013-RGB GROUP 17 Interaction between the CAP theorem and incremental algorithms
  • 18.
    Unfortunately there isno escape from this complexity if you want eventual consistency in the speed layer If the real-time view becomes corrupted, the batch/serving layers will later automatically correct the mistake in the serving layer views BIS2013-RGB GROUP 18 Keep calm with Lambda
  • 19.
    Synchronous updates BIS2013-RGB GROUP19 The application issues a request directly to the database and blocks until the update is processed
  • 20.
    Asynchronous updates BIS2013-RGB GROUP20 Asynchronous update requests are placed in a queue with the updates occurring at a later time
  • 21.
    Asynchronous versus synchronous updates SynchronousAsynchronous Fast Slower Coordinate the update with other aspects of the application Not coordinate May overload the database Not overload the database BIS2013-RGB GROUP 21
  • 22.
    Expiring real-time views Sincethe simpler batch and serving layers continuously override the speed layer, the speed layer views only need to represent data yet to be processed by the batch computation workflow BIS2013-RGB GROUP 22
  • 23.
    Expiring real-time views Instead,we present a generic approach for expiring speed layer views that works regardless of the speed layer databases being used. To understand this approach, let's first get an understanding of what exactly needs to be expired each time the serving layer is updated BIS2013-RGB GROUP 23
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
    Cassandra's data model Whatdo you have ever heard about it? Column oriented/family database Key value based database NOSql database Real time operational data store Distributed database ...
  • 29.
    Cassandra's data model Somedefinitions: Key space: is the outermost container for data in Cassandra. Column families: is a container for a collection of rows. Each row contains ordered columns. Columns: is the most basic unit of data in Cassandra data model. Consists of <key,value,timestamp> triplet.
  • 30.
  • 31.
    Cassandra's data model Cassandrafeatures make it suitable for real-time view: Partitions keys among nodes in cluster: RandomPatitioner and OrderPreservingPatitioner Composite columns.
  • 32.
  • 33.
    Conclusion Theoretical model ofthe speed layer CAP theorem Incremental computation in stead of batch computation Store speed layer view (real-time)
  • 34.
    References BIS2013-RGB GROUP 34 BigData, Principles and best practices of scalable real-time data system – Nathan Marz