SlideShare a Scribd company logo
1 of 34
LAMBDA ARCHITECTURE
Speed layer: Real-time views
BIS2013
Database Systems
Prof. Duc Khanh Tran, PhD
1st Hai Vo
2nd Huy Huynh
3rd Tin Ho
BIS2013-RGB GROUP 2
Hai Vo The Speed Layer
Computing & Storing real-time views
Huy Huynh Challenges of incremental algorithms
Asynchronous versus synchronous updates
Tin Ho Expiring real-time views
Cassandra: an example speed layer view
Conclusion
Agenda
Motivation
The update latency requirements vary a
great deal between applications.
Some applications require updates to propagate
immediately, while in other applications a
latency of a few hours is fine.
BIS2013-RGB GROUP 3
Serving Layer
Motivation
BIS2013-RGB GROUP 4
NEW DATA
Master Data
Batch Layer
Batch
View
Batch
View
Batch
View
Robust & fault-tolerant
Low latency
reads
Allows ad hoc queries
Low latency
update
Speed Layer
Real - Time
View
Real - Time
View
Speed Layer vs Batch Layer
The speed layer is similar to the batch layer in
that it produces views based on data it receives.
Incremental updates as opposed to
recomputation updates.
Only produces views on recent data vs produces
views on the entire dataset.
BIS2013-RGB GROUP 5
Speed Layer
Processing data on a smaller scale
Requires databases that support random
reads and random writes
More complex and thus more prone to error.
The speed layer views are transient.
BIS2013-RGB GROUP 6
Two major facets of the speed layer
Storing the real-time views and processing
the incoming data stream so as to update
those views
BIS2013-RGB GROUP 7
Computing Real-time Views
BIS2013-RGB GROUP 8
Real-time View = function (recent data)
ALL DATA Recent Data
Batch Views Real-time Views
Computing Real-time Views
BIS2013-RGB GROUP 9
Real-time View = function (new data, previous real-time view)
ALL DATA Recent Data
Batch Views Real-time Views
Real-time Views
(Updated)
Storing Real-time Views
Low-latency random reads, and using
incremental algorithms necessitates low-latency
random updates
BIS2013-RGB GROUP 10
Random reads Fault-tolerantRandom writes Scalability
NoSQL database i.e. Cassandra
Challenges of incremental
computation
Incremental algorithms are less general and less
human fault-tolerant, but higher performance
Challenge in a real-time context: the
interaction between incremental algorithms
and the CAP theorem
BIS2013-RGB GROUP 11
The CAP Theorem
BIS2013-RGB GROUP 12
Consistency
Partition
Tolerance
Availability
“You can have at most two of Consistency,
Availability, and Partition-tolerance"
“When a distributed data
system is Partitioned, it can be Consistent or Available
but not both"
the system continues to
operate despite arbitrary
message loss or failure of
part of the system
every request receives a
response
all nodes see the same
data at the same time
C vs A
When you choose consistency, sometimes a
query will receive an error instead of an answer
When you choose availability, at best you will
have where eventual consistency
BIS2013-RGB GROUP 13
Replication in Distributed system
Sally New York
BIS2013-RGB GROUP 14
Joe Paris
Maria Tokyo
Sally New York
Maria Tokyo
Sally New York
Maria Tokyo
Joe Paris
Maria Tokyo
Eventual consistency
BIS2013-RGB GROUP 15
BIS2013-RGB GROUP 16
Eventual consistency counting
Complexity in a real-time eventually consistent
context
Read and repair algorithms are needed
BIS2013-RGB GROUP 17
Interaction between the CAP theorem
and incremental algorithms
Unfortunately there is no escape from this
complexity if you want eventual consistency in
the speed layer
If the real-time view becomes corrupted, the
batch/serving layers will later automatically
correct the mistake in the serving layer views
BIS2013-RGB GROUP 18
Keep calm with Lambda
Synchronous updates
BIS2013-RGB GROUP 19
The application issues a request directly to the
database and blocks until the update is
processed
Asynchronous updates
BIS2013-RGB GROUP 20
Asynchronous update requests are placed in
a queue with the updates occurring at a later
time
Asynchronous versus synchronous
updates
Synchronous Asynchronous
Fast Slower
Coordinate the update with
other aspects of the application
Not coordinate
May overload the database Not overload the database
BIS2013-RGB GROUP 21
Expiring real-time views
Since the simpler batch and serving layers
continuously override the speed layer, the
speed layer views only need to represent data
yet to be processed by the batch computation
workflow
BIS2013-RGB GROUP 22
Expiring real-time views
Instead, we present a generic approach for
expiring speed layer views that works
regardless of the speed layer databases
being used. To understand this approach, let's
first get an understanding of what exactly needs
to be expired each time the serving layer is
updated
BIS2013-RGB GROUP 23
Expiring real-time views
BIS2013-RGB GROUP 24
Expiring real-time views
BIS2013-RGB GROUP 25
Expiring real-time views
BIS2013-RGB GROUP 26
Expiring real-time views
BIS2013-RGB GROUP 27
Cassandra's data model
What do you have ever heard about it?
Column oriented/family database
Key value based database
NOSql database
Real time operational data store
Distributed database
...
Cassandra's data model
Some definitions:
Key space: is the outermost container for data in
Cassandra.
Column families: is a container for a collection of
rows. Each row contains ordered columns.
Columns: is the most basic unit of data in
Cassandra data model. Consists of
<key,value,timestamp> triplet.
Cassandra's data model
Column family example:
Cassandra's data model
Cassandra features make it suitable for real-time
view:
Partitions keys among nodes in cluster:
RandomPatitioner and OrderPreservingPatitioner
Composite columns.
Using Cassandra!
Conclusion
Theoretical model of the speed layer
CAP theorem
Incremental computation in stead of batch
computation
Store speed layer view (real-time)
References
BIS2013-RGB GROUP 34
Big Data, Principles and best practices of
scalable real-time data system – Nathan Marz

More Related Content

What's hot

Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceDATAVERSITY
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
 
Temporal Databases: Queries
Temporal Databases: QueriesTemporal Databases: Queries
Temporal Databases: Queriestorp42
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesLorenzo Alberton
 
Build data quality rules and data cleansing into your data pipelines
Build data quality rules and data cleansing into your data pipelinesBuild data quality rules and data cleansing into your data pipelines
Build data quality rules and data cleansing into your data pipelinesMark Kromer
 
Informatica student meterial
Informatica student meterialInformatica student meterial
Informatica student meterialSunil Kotthakota
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureDATAVERSITY
 
DAMA Feb2015 Mastering Master Data
DAMA Feb2015 Mastering Master DataDAMA Feb2015 Mastering Master Data
DAMA Feb2015 Mastering Master DataMary Levins, PMP
 
TOP_407070357-Data-Governance-Playbook.pptx
TOP_407070357-Data-Governance-Playbook.pptxTOP_407070357-Data-Governance-Playbook.pptx
TOP_407070357-Data-Governance-Playbook.pptxSabrinaLameiras1
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Thinking in systems slides
Thinking in systems slidesThinking in systems slides
Thinking in systems slidesThái Trần
 
Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Yaroslav Tkachenko
 
Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?DATAVERSITY
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowTreasure Data, Inc.
 
Product recommendation for Santander Bank customers
Product recommendation for Santander Bank customersProduct recommendation for Santander Bank customers
Product recommendation for Santander Bank customersSumit Saini
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...Databricks
 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability Precisely
 

What's hot (20)

Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and Governance
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
Data warehouse testing
Data warehouse testingData warehouse testing
Data warehouse testing
 
Temporal Databases: Queries
Temporal Databases: QueriesTemporal Databases: Queries
Temporal Databases: Queries
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
 
Build data quality rules and data cleansing into your data pipelines
Build data quality rules and data cleansing into your data pipelinesBuild data quality rules and data cleansing into your data pipelines
Build data quality rules and data cleansing into your data pipelines
 
Informatica student meterial
Informatica student meterialInformatica student meterial
Informatica student meterial
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 
DAMA Feb2015 Mastering Master Data
DAMA Feb2015 Mastering Master DataDAMA Feb2015 Mastering Master Data
DAMA Feb2015 Mastering Master Data
 
TOP_407070357-Data-Governance-Playbook.pptx
TOP_407070357-Data-Governance-Playbook.pptxTOP_407070357-Data-Governance-Playbook.pptx
TOP_407070357-Data-Governance-Playbook.pptx
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Thinking in systems slides
Thinking in systems slidesThinking in systems slides
Thinking in systems slides
 
Vertica
VerticaVertica
Vertica
 
Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?
 
Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
 
Product recommendation for Santander Bank customers
Product recommendation for Santander Bank customersProduct recommendation for Santander Bank customers
Product recommendation for Santander Bank customers
 
Best Practices in MDM with Dan Power
Best Practices in MDM with Dan PowerBest Practices in MDM with Dan Power
Best Practices in MDM with Dan Power
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
 

Viewers also liked

Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedApache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedGuido Schmutz
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architecturesDaniel Marcous
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Brian O'Neill
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Brian O'Neill
 
A Critique of the CAP Theorem (Papers We Love @ Seattle)
A Critique of the CAP Theorem (Papers We Love @ Seattle)A Critique of the CAP Theorem (Papers We Love @ Seattle)
A Critique of the CAP Theorem (Papers We Love @ Seattle)Trevor Lalish-Menagh
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudScott Miao
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...Hakka Labs
 
Real time machine learning
Real time machine learningReal time machine learning
Real time machine learningVinoth Kannan
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
OIDC16: Open Data in Belgium
OIDC16: Open Data in BelgiumOIDC16: Open Data in Belgium
OIDC16: Open Data in BelgiumBart Hanssens
 
Top Agile Metrics
Top Agile MetricsTop Agile Metrics
Top Agile MetricsXBOSoft
 
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016DataStax
 
NoSQL, Base VS ACID e Teorema CAP
NoSQL, Base VS ACID e Teorema CAPNoSQL, Base VS ACID e Teorema CAP
NoSQL, Base VS ACID e Teorema CAPAricelio Souza
 

Viewers also liked (20)

Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Arquitectura Lambda
Arquitectura LambdaArquitectura Lambda
Arquitectura Lambda
 
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedApache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms compared
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
A Critique of the CAP Theorem (Papers We Love @ Seattle)
A Critique of the CAP Theorem (Papers We Love @ Seattle)A Critique of the CAP Theorem (Papers We Love @ Seattle)
A Critique of the CAP Theorem (Papers We Love @ Seattle)
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
 
Real time machine learning
Real time machine learningReal time machine learning
Real time machine learning
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
OIDC16: Open Data in Belgium
OIDC16: Open Data in BelgiumOIDC16: Open Data in Belgium
OIDC16: Open Data in Belgium
 
Top Agile Metrics
Top Agile MetricsTop Agile Metrics
Top Agile Metrics
 
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
 
Big data philly_jug
Big data philly_jugBig data philly_jug
Big data philly_jug
 
NoSQL, Base VS ACID e Teorema CAP
NoSQL, Base VS ACID e Teorema CAPNoSQL, Base VS ACID e Teorema CAP
NoSQL, Base VS ACID e Teorema CAP
 

Similar to Speed layer : Real time views in LAMBDA architecture

HbaseHivePigbyRohitDubey
HbaseHivePigbyRohitDubeyHbaseHivePigbyRohitDubey
HbaseHivePigbyRohitDubeyRohit Dubey
 
Whitepaper : Building an Efficient Microservices Architecture
Whitepaper : Building an Efficient Microservices ArchitectureWhitepaper : Building an Efficient Microservices Architecture
Whitepaper : Building an Efficient Microservices ArchitectureNewt Global Consulting LLC
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptxlevichan1
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityPapitha Velumani
 
Consistent join queries in cloud data stores
Consistent join queries in cloud data storesConsistent join queries in cloud data stores
Consistent join queries in cloud data storesJoão Gabriel Lima
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options ComparedSergey Bushik
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsPanagiotis Papadopoulos
 
Benchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive ApplicationsBenchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive ApplicationsAltoros
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
 
BIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephantBIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephantRoman Nikitchenko
 
Build Application With MongoDB
Build Application With MongoDBBuild Application With MongoDB
Build Application With MongoDBEdureka!
 
Events and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of WebopsEvents and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of WebopsDatadog
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.darach
 
Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)jaxLondonConference
 
What is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of databaseWhat is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of databaseAlireza Kamrani
 
Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbsonalighai
 
Altoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applicationsAltoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applicationsJeff Harris
 

Similar to Speed layer : Real time views in LAMBDA architecture (20)

HbaseHivePigbyRohitDubey
HbaseHivePigbyRohitDubeyHbaseHivePigbyRohitDubey
HbaseHivePigbyRohitDubey
 
Whitepaper : Building an Efficient Microservices Architecture
Whitepaper : Building an Efficient Microservices ArchitectureWhitepaper : Building an Efficient Microservices Architecture
Whitepaper : Building an Efficient Microservices Architecture
 
NoSQL
NoSQLNoSQL
NoSQL
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptx
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
Consistent join queries in cloud data stores
Consistent join queries in cloud data storesConsistent join queries in cloud data stores
Consistent join queries in cloud data stores
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options Compared
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and Limitations
 
Benchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive ApplicationsBenchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive Applications
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
S18 das
S18 dasS18 das
S18 das
 
BIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephantBIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephant
 
Build Application With MongoDB
Build Application With MongoDBBuild Application With MongoDB
Build Application With MongoDB
 
Events and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of WebopsEvents and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of Webops
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.
 
Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)
 
What is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of databaseWhat is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of database
 
Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsb
 
Altoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applicationsAltoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applications
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Speed layer : Real time views in LAMBDA architecture

  • 1. LAMBDA ARCHITECTURE Speed layer: Real-time views BIS2013 Database Systems Prof. Duc Khanh Tran, PhD 1st Hai Vo 2nd Huy Huynh 3rd Tin Ho
  • 2. BIS2013-RGB GROUP 2 Hai Vo The Speed Layer Computing & Storing real-time views Huy Huynh Challenges of incremental algorithms Asynchronous versus synchronous updates Tin Ho Expiring real-time views Cassandra: an example speed layer view Conclusion Agenda
  • 3. Motivation The update latency requirements vary a great deal between applications. Some applications require updates to propagate immediately, while in other applications a latency of a few hours is fine. BIS2013-RGB GROUP 3
  • 4. Serving Layer Motivation BIS2013-RGB GROUP 4 NEW DATA Master Data Batch Layer Batch View Batch View Batch View Robust & fault-tolerant Low latency reads Allows ad hoc queries Low latency update Speed Layer Real - Time View Real - Time View
  • 5. Speed Layer vs Batch Layer The speed layer is similar to the batch layer in that it produces views based on data it receives. Incremental updates as opposed to recomputation updates. Only produces views on recent data vs produces views on the entire dataset. BIS2013-RGB GROUP 5
  • 6. Speed Layer Processing data on a smaller scale Requires databases that support random reads and random writes More complex and thus more prone to error. The speed layer views are transient. BIS2013-RGB GROUP 6
  • 7. Two major facets of the speed layer Storing the real-time views and processing the incoming data stream so as to update those views BIS2013-RGB GROUP 7
  • 8. Computing Real-time Views BIS2013-RGB GROUP 8 Real-time View = function (recent data) ALL DATA Recent Data Batch Views Real-time Views
  • 9. Computing Real-time Views BIS2013-RGB GROUP 9 Real-time View = function (new data, previous real-time view) ALL DATA Recent Data Batch Views Real-time Views Real-time Views (Updated)
  • 10. Storing Real-time Views Low-latency random reads, and using incremental algorithms necessitates low-latency random updates BIS2013-RGB GROUP 10 Random reads Fault-tolerantRandom writes Scalability NoSQL database i.e. Cassandra
  • 11. Challenges of incremental computation Incremental algorithms are less general and less human fault-tolerant, but higher performance Challenge in a real-time context: the interaction between incremental algorithms and the CAP theorem BIS2013-RGB GROUP 11
  • 12. The CAP Theorem BIS2013-RGB GROUP 12 Consistency Partition Tolerance Availability “You can have at most two of Consistency, Availability, and Partition-tolerance" “When a distributed data system is Partitioned, it can be Consistent or Available but not both" the system continues to operate despite arbitrary message loss or failure of part of the system every request receives a response all nodes see the same data at the same time
  • 13. C vs A When you choose consistency, sometimes a query will receive an error instead of an answer When you choose availability, at best you will have where eventual consistency BIS2013-RGB GROUP 13
  • 14. Replication in Distributed system Sally New York BIS2013-RGB GROUP 14 Joe Paris Maria Tokyo Sally New York Maria Tokyo Sally New York Maria Tokyo Joe Paris Maria Tokyo
  • 16. BIS2013-RGB GROUP 16 Eventual consistency counting
  • 17. Complexity in a real-time eventually consistent context Read and repair algorithms are needed BIS2013-RGB GROUP 17 Interaction between the CAP theorem and incremental algorithms
  • 18. Unfortunately there is no escape from this complexity if you want eventual consistency in the speed layer If the real-time view becomes corrupted, the batch/serving layers will later automatically correct the mistake in the serving layer views BIS2013-RGB GROUP 18 Keep calm with Lambda
  • 19. Synchronous updates BIS2013-RGB GROUP 19 The application issues a request directly to the database and blocks until the update is processed
  • 20. Asynchronous updates BIS2013-RGB GROUP 20 Asynchronous update requests are placed in a queue with the updates occurring at a later time
  • 21. Asynchronous versus synchronous updates Synchronous Asynchronous Fast Slower Coordinate the update with other aspects of the application Not coordinate May overload the database Not overload the database BIS2013-RGB GROUP 21
  • 22. Expiring real-time views Since the simpler batch and serving layers continuously override the speed layer, the speed layer views only need to represent data yet to be processed by the batch computation workflow BIS2013-RGB GROUP 22
  • 23. Expiring real-time views Instead, we present a generic approach for expiring speed layer views that works regardless of the speed layer databases being used. To understand this approach, let's first get an understanding of what exactly needs to be expired each time the serving layer is updated BIS2013-RGB GROUP 23
  • 28. Cassandra's data model What do you have ever heard about it? Column oriented/family database Key value based database NOSql database Real time operational data store Distributed database ...
  • 29. Cassandra's data model Some definitions: Key space: is the outermost container for data in Cassandra. Column families: is a container for a collection of rows. Each row contains ordered columns. Columns: is the most basic unit of data in Cassandra data model. Consists of <key,value,timestamp> triplet.
  • 30. Cassandra's data model Column family example:
  • 31. Cassandra's data model Cassandra features make it suitable for real-time view: Partitions keys among nodes in cluster: RandomPatitioner and OrderPreservingPatitioner Composite columns.
  • 33. Conclusion Theoretical model of the speed layer CAP theorem Incremental computation in stead of batch computation Store speed layer view (real-time)
  • 34. References BIS2013-RGB GROUP 34 Big Data, Principles and best practices of scalable real-time data system – Nathan Marz