SlideShare a Scribd company logo
1 of 20
Narayan Kumar
Software Consultant
Knoldus Software LLP
Lambda Architecture with Spark
AgendaAgenda
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
AgendaAgenda
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
“ Lambda architecture is a data-processing architecture designed to
handle massive quantities of data by taking advantage of both batch-
and stream-processing methods.”
wikipedia
“ Lambda architecture is a data-processing architecture designed to
handle massive quantities of data by taking advantage of both batch-
and stream-processing methods.”
wikipedia
What is Lambda Architecture ?What is Lambda Architecture ?
Coined by Nathan marz
➢ Ex- Twitter Engineer
➢ Creator of Apache Storm
AgendaAgenda
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
Components of Lambda ArchitectureComponents of Lambda Architecture
Lambda architecture broadly classified into three layer :-
➢ Batch Layer
➢ Speed Layer
➢ Serving Layer
Overview of Lambda ArchitectureOverview of Lambda Architecture
https://www.mapr.com/developercentral/lambda-architecture
Batch LayerBatch Layer
In the Lambda Architecture, the batch layer precomputes the
master dataset into batch views so that queries can be resolved
with low latency.
Master DataSetMaster DataSet
The master dataset is the source of truth in the Lambda Archi-
tecture. Even if you were to lose all your serving layer datasets
and speed layer datasets, you could reconstruct your
application from the master dataset.
Data in master dataset must hold three properties :-
➢ Data is raw
➢ Data is immutable
➢ Data is eternally true
Computing functions on the batch layerComputing functions on the batch layer
As our master dataset is continually growing, we must have a
strategy for updating our batch views when new data becomes
available.
Here we have two suitable computing algorithm :-
➢ Recomputation algorithms : Throwing away the old batch views
and recomputing functions over the entire master dataset.
➢ Incremental algorithms : An incremental algorithm will update the
views directly when new data arrives.
Speed LayerSpeed Layer
There are two major facets of the speed layer: storing the
realtime views and processing the incoming data stream so as
to update those views.
Storing real time viewsStoring real time views
The underlying storage layer must meet the following requirements: -
Random reads : A realtime view should support fast random reads to
answer queries quickly.
Random writes : To support incremental algorithms, it must also be
possible to modify a realtime view with low latency.
Scalability : As with the serving layer views, the realtime views should
scale with the amount of data they store and the read/write rates required
by the application.
Fault tolerance : If a disk or a machine crashes, a realtime view should
continue to function normally.
Serving LayerServing Layer
In the Lambda Architecture, the serving layer provides low-latency
access to the results of calculations performed on the master
dataset. The serving layer views are slightly out of date due to the
time required for batch computation.
Requirements for a serving layer databaseRequirements for a serving layer database
Similar to speed layer these are following requirements: -
Random reads : A serving layer database must support random reads,
with indexes providing direct access to small portions of the view.
Batch writable : The batch views for a serving layer are produced from
scratch. When a new version of a view becomes available, it must be
possible to completely swap out the older version with the updated view.
Scalability : A serving layer database must be capable of handling views
of arbitrary size.
Fault tolerance : Because a serving layer database is distributed, it must
be tolerant of machine failures.
AgendaAgenda
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
Advantages of Lambda ArchitectureAdvantages of Lambda Architecture
These are following advantages of lambda architecture: -
Human fault tolerance : LA is provides human fault tolerance capability
to the Big data system.
Operational complexity : It resolved operational complexity issue of big
historical query by divide into precomputed query and on fly query.
Resilience : LA is fully resilience,because it is difficult for human errors or
hardware faults to corrupt data stored in the system since the system does
not allow update or delete operations in existing data.
Simple & Maintainable : It is simple in nature so we can easily
understand and it’s flexible architecture is helpful in maintainance.
AgendaAgenda
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
● What is Lambda Architecture ?
● Components of Lambda Architecture
● Advantages of Lambda Architecture
● Implementation with Spark and it's Benefits
● Code Review & Demo
Implementation with Spark and it's BenefitsImplementation with Spark and it's Benefits
There are following benefits to implement LA with Spark : -
➢ Spark gave us unified stack like Spark Core,Spark SQL,Spark
Streaming,Mllib, and GraphX, so that we can easily implement LA.
➢ Spark has clean and easy-to-use APIs (far more readable and with
less boilerplate code than MapReduce).
➢ Biggest advantage Spark gave us in this case is Spark Streaming,
which allowed us to re-use the same aggregates we wrote for our
batch application on a real-time data stream.
ReferencesReferences
Big Data Principles and best practices of scalable real-time data systems
Nathan Marz WITH James Warren
https://en.wikipedia.org/wiki/Lambda_architecture
https://www.mapr.com/developercentral/lambda-architecture
http://lambda-architecture.net/
Thank youThank you

More Related Content

What's hot

[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...
[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...
[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...Amazon Web Services
 
Hybridcloud & Multicloud with GCP Anthos.pptx
Hybridcloud & Multicloud with GCP Anthos.pptxHybridcloud & Multicloud with GCP Anthos.pptx
Hybridcloud & Multicloud with GCP Anthos.pptxHARSH MANVAR
 
Fast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for RedisFast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for RedisAmazon Web Services
 
Reactive stream processing using Akka streams
Reactive stream processing using Akka streams Reactive stream processing using Akka streams
Reactive stream processing using Akka streams Johan Andrén
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
ENT211_How to Assess Your Organization’s Readiness to Migrate at Scale to AWS
ENT211_How to Assess Your Organization’s Readiness to Migrate at Scale to AWSENT211_How to Assess Your Organization’s Readiness to Migrate at Scale to AWS
ENT211_How to Assess Your Organization’s Readiness to Migrate at Scale to AWSAmazon Web Services
 
Cloud Computing Training PPT
Cloud Computing Training PPTCloud Computing Training PPT
Cloud Computing Training PPTAmit Poonia
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guideCynthia Saracco
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreAmazon Web Services
 
Best Practices for Hosting Web Applications on AWS
Best Practices for Hosting Web Applications on AWSBest Practices for Hosting Web Applications on AWS
Best Practices for Hosting Web Applications on AWSAmazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Cloud computing reference architecture from nist and ibm
Cloud computing reference architecture from nist and ibmCloud computing reference architecture from nist and ibm
Cloud computing reference architecture from nist and ibmRichard Kuo
 

What's hot (20)

[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...
[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...
[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Hybridcloud & Multicloud with GCP Anthos.pptx
Hybridcloud & Multicloud with GCP Anthos.pptxHybridcloud & Multicloud with GCP Anthos.pptx
Hybridcloud & Multicloud with GCP Anthos.pptx
 
Fast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for RedisFast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for Redis
 
Reactive stream processing using Akka streams
Reactive stream processing using Akka streams Reactive stream processing using Akka streams
Reactive stream processing using Akka streams
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
ENT211_How to Assess Your Organization’s Readiness to Migrate at Scale to AWS
ENT211_How to Assess Your Organization’s Readiness to Migrate at Scale to AWSENT211_How to Assess Your Organization’s Readiness to Migrate at Scale to AWS
ENT211_How to Assess Your Organization’s Readiness to Migrate at Scale to AWS
 
Aula 4 - Introdução a aws
Aula 4 - Introdução a awsAula 4 - Introdução a aws
Aula 4 - Introdução a aws
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Cloud Computing Training PPT
Cloud Computing Training PPTCloud Computing Training PPT
Cloud Computing Training PPT
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guide
 
Intro to AWS Lambda
Intro to AWS Lambda Intro to AWS Lambda
Intro to AWS Lambda
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
 
Apache spark
Apache sparkApache spark
Apache spark
 
Best Practices for Hosting Web Applications on AWS
Best Practices for Hosting Web Applications on AWSBest Practices for Hosting Web Applications on AWS
Best Practices for Hosting Web Applications on AWS
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Cloud computing reference architecture from nist and ibm
Cloud computing reference architecture from nist and ibmCloud computing reference architecture from nist and ibm
Cloud computing reference architecture from nist and ibm
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 

Similar to Lambda Architecture with Spark

Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsAsis Mohanty
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelGarindra Prahandono
 
Spark logs made easy
Spark logs made easySpark logs made easy
Spark logs made easySimona Meriam
 
Real time architecture big data
Real time architecture big dataReal time architecture big data
Real time architecture big dataSanjeev Solanki
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKzmhassan
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overviewKaran Alang
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3Databricks
 
Lambda usecase
Lambda usecaseLambda usecase
Lambda usecaseDavid Tung
 
2014 09-12 lambda-architecture-at-indix
2014 09-12 lambda-architecture-at-indix2014 09-12 lambda-architecture-at-indix
2014 09-12 lambda-architecture-at-indixYu Ishikawa
 
Apache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosApache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosOpenSistemas
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for BeginnersAnirudh
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 Chester Chen
 
Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseScyllaDB
 

Similar to Lambda Architecture with Spark (20)

Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
 
Spark logs made easy
Spark logs made easySpark logs made easy
Spark logs made easy
 
Real time architecture big data
Real time architecture big dataReal time architecture big data
Real time architecture big data
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
Lambda usecase
Lambda usecaseLambda usecase
Lambda usecase
 
2014 09-12 lambda-architecture-at-indix
2014 09-12 lambda-architecture-at-indix2014 09-12 lambda-architecture-at-indix
2014 09-12 lambda-architecture-at-indix
 
Apache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosApache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectos
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
 
Spark
SparkSpark
Spark
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency Database
 

More from Knoldus Inc.

Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxKnoldus Inc.
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxKnoldus Inc.
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxKnoldus Inc.
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxKnoldus Inc.
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationKnoldus Inc.
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationKnoldus Inc.
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIsKnoldus Inc.
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II PresentationKnoldus Inc.
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAKnoldus Inc.
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Knoldus Inc.
 
Azure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxAzure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxKnoldus Inc.
 
The Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and KotlinThe Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and KotlinKnoldus Inc.
 
Data Engineering with Databricks Presentation
Data Engineering with Databricks PresentationData Engineering with Databricks Presentation
Data Engineering with Databricks PresentationKnoldus Inc.
 
Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Knoldus Inc.
 
NoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptxNoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptxKnoldus Inc.
 
Mastering Distributed Performance Testing
Mastering Distributed Performance TestingMastering Distributed Performance Testing
Mastering Distributed Performance TestingKnoldus Inc.
 
MLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxMLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxKnoldus Inc.
 
Introduction to Ansible Tower Presentation
Introduction to Ansible Tower PresentationIntroduction to Ansible Tower Presentation
Introduction to Ansible Tower PresentationKnoldus Inc.
 
CQRS with dot net services presentation.
CQRS with dot net services presentation.CQRS with dot net services presentation.
CQRS with dot net services presentation.Knoldus Inc.
 

More from Knoldus Inc. (20)

Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptx
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptx
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptx
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptx
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake Presentation
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics Presentation
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIs
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II Presentation
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRA
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)
 
Azure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxAzure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptx
 
The Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and KotlinThe Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and Kotlin
 
Data Engineering with Databricks Presentation
Data Engineering with Databricks PresentationData Engineering with Databricks Presentation
Data Engineering with Databricks Presentation
 
Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)
 
NoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptxNoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptx
 
Mastering Distributed Performance Testing
Mastering Distributed Performance TestingMastering Distributed Performance Testing
Mastering Distributed Performance Testing
 
MLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxMLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptx
 
Introduction to Ansible Tower Presentation
Introduction to Ansible Tower PresentationIntroduction to Ansible Tower Presentation
Introduction to Ansible Tower Presentation
 
CQRS with dot net services presentation.
CQRS with dot net services presentation.CQRS with dot net services presentation.
CQRS with dot net services presentation.
 

Recently uploaded

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 

Recently uploaded (20)

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 

Lambda Architecture with Spark

  • 1. Narayan Kumar Software Consultant Knoldus Software LLP Lambda Architecture with Spark
  • 2. AgendaAgenda ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo
  • 3. AgendaAgenda ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo
  • 4. “ Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods.” wikipedia “ Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods.” wikipedia What is Lambda Architecture ?What is Lambda Architecture ? Coined by Nathan marz ➢ Ex- Twitter Engineer ➢ Creator of Apache Storm
  • 5. AgendaAgenda ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo
  • 6. Components of Lambda ArchitectureComponents of Lambda Architecture Lambda architecture broadly classified into three layer :- ➢ Batch Layer ➢ Speed Layer ➢ Serving Layer
  • 7. Overview of Lambda ArchitectureOverview of Lambda Architecture https://www.mapr.com/developercentral/lambda-architecture
  • 8. Batch LayerBatch Layer In the Lambda Architecture, the batch layer precomputes the master dataset into batch views so that queries can be resolved with low latency.
  • 9. Master DataSetMaster DataSet The master dataset is the source of truth in the Lambda Archi- tecture. Even if you were to lose all your serving layer datasets and speed layer datasets, you could reconstruct your application from the master dataset. Data in master dataset must hold three properties :- ➢ Data is raw ➢ Data is immutable ➢ Data is eternally true
  • 10. Computing functions on the batch layerComputing functions on the batch layer As our master dataset is continually growing, we must have a strategy for updating our batch views when new data becomes available. Here we have two suitable computing algorithm :- ➢ Recomputation algorithms : Throwing away the old batch views and recomputing functions over the entire master dataset. ➢ Incremental algorithms : An incremental algorithm will update the views directly when new data arrives.
  • 11. Speed LayerSpeed Layer There are two major facets of the speed layer: storing the realtime views and processing the incoming data stream so as to update those views.
  • 12. Storing real time viewsStoring real time views The underlying storage layer must meet the following requirements: - Random reads : A realtime view should support fast random reads to answer queries quickly. Random writes : To support incremental algorithms, it must also be possible to modify a realtime view with low latency. Scalability : As with the serving layer views, the realtime views should scale with the amount of data they store and the read/write rates required by the application. Fault tolerance : If a disk or a machine crashes, a realtime view should continue to function normally.
  • 13. Serving LayerServing Layer In the Lambda Architecture, the serving layer provides low-latency access to the results of calculations performed on the master dataset. The serving layer views are slightly out of date due to the time required for batch computation.
  • 14. Requirements for a serving layer databaseRequirements for a serving layer database Similar to speed layer these are following requirements: - Random reads : A serving layer database must support random reads, with indexes providing direct access to small portions of the view. Batch writable : The batch views for a serving layer are produced from scratch. When a new version of a view becomes available, it must be possible to completely swap out the older version with the updated view. Scalability : A serving layer database must be capable of handling views of arbitrary size. Fault tolerance : Because a serving layer database is distributed, it must be tolerant of machine failures.
  • 15. AgendaAgenda ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo
  • 16. Advantages of Lambda ArchitectureAdvantages of Lambda Architecture These are following advantages of lambda architecture: - Human fault tolerance : LA is provides human fault tolerance capability to the Big data system. Operational complexity : It resolved operational complexity issue of big historical query by divide into precomputed query and on fly query. Resilience : LA is fully resilience,because it is difficult for human errors or hardware faults to corrupt data stored in the system since the system does not allow update or delete operations in existing data. Simple & Maintainable : It is simple in nature so we can easily understand and it’s flexible architecture is helpful in maintainance.
  • 17. AgendaAgenda ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo ● What is Lambda Architecture ? ● Components of Lambda Architecture ● Advantages of Lambda Architecture ● Implementation with Spark and it's Benefits ● Code Review & Demo
  • 18. Implementation with Spark and it's BenefitsImplementation with Spark and it's Benefits There are following benefits to implement LA with Spark : - ➢ Spark gave us unified stack like Spark Core,Spark SQL,Spark Streaming,Mllib, and GraphX, so that we can easily implement LA. ➢ Spark has clean and easy-to-use APIs (far more readable and with less boilerplate code than MapReduce). ➢ Biggest advantage Spark gave us in this case is Spark Streaming, which allowed us to re-use the same aggregates we wrote for our batch application on a real-time data stream.
  • 19. ReferencesReferences Big Data Principles and best practices of scalable real-time data systems Nathan Marz WITH James Warren https://en.wikipedia.org/wiki/Lambda_architecture https://www.mapr.com/developercentral/lambda-architecture http://lambda-architecture.net/

Editor's Notes

  1. 1The batch layer runs functions over the master dataset to precompute intermediate data called batch views. 2.The speed layer compensates for the high latency of the batch layer by providing low-latency updates using data that has yet to be precomputed into a batch view. 3.Queries are then satisfied by processing data from the serving layer views and the speed layer views, and merging the results.
  2. 1
  3. 1.The batch layer runs functions over the master dataset to precompute intermediate data called batch views. 2. Batch Layer has three component: 1Master data set:- which is immuatble and append only data set. 2 precomputing function : it is generally a map reduce function which operates on master data set and produce batch view.precomputing functions are use for high latency query like historical queries. 3 Batch View: It is a outcome of precomputed function
  4. 1.The master dataset is the only part of the Lambda Architecture that absolutely must be safeguarded from corruption. 2.Data is raw : When designing your Big Data system, you want to be able to answer as many questions as possible. To do so we need to store raw data in master dataset because if we store normalized data then we have to lose many facts of data but again it depends on use case ,what level of rawness of data we require . 3 Data is immutable : In immutability we can not update or delete data ,we can only append data in dataset. There are some vital advantages of it: a)Human-fault tolerance: if by mistake we added any bad data in dataset and after some time found this is bad we just remove this bad data and recompute on master data set. b)Simplicity:Immutable dataset is simple because it doesn’t required to store indexes like for mutable data. 4Data is eternally true: The key consequence of immutability is that each piece of data is true in perpetuity.That is, a piece of data, once true, must always be true. Immutability wouldn’t make sense without this property.
  5. 1.Performance: a)RA:Requires computational effort to process the entire master dataset. b)IA:Requires less computational resources but may generate much larger batch views. 2.Human-fault tolerance: a)RA:Extremely tolerant of human errors because the batch views are continually rebuilt. b)IA:Doesn’t facilitate repairing errors in the batch views; repairs are ad hoc and may require estimates. 3.Generality : a)RA: Complexity of the algorithm is addressed during precomputation, resulting in simple batch views and low-latency, on-the-fly processing. b)IA:Requires special tailoring; may shift complexity to on-the-fly query processing. 4.Conclusion: So conclusion of both algorithm is. a)RA:Essential to supporting a robust data-processing system. b)IA:Can increase the efficiency of your system, but only as a supplement to recomputation algorithms.
  6. 1. As we know that the power of the Lambda Architecture lies in the separation of roles in the different layers. 2.Speed layer is one of core layer in this architecture.It fills the delta gap that is left by batch layer.that means combine speed layer view and batch view give us capibility fire any adhoc query on all data that is query=function(over all data). 3.We can also set Expiring realtime views example in memcache we set expiring time for key/value pairs.
  7. 1. Random reads: This means the data it contains must be indexed. 2 scalability: Typically this implies that realtime views can be distributed across many machines. Now days sharding technique is widely use to meet scalability requirement in database. 3. Fault tolerance : Fault tolerance is accomplished by replicating data across machines so there are backups should a single machine fail.
  8. 1. But this is not a concern, because the speed layer will be responsible for any data not yet available in the serving layer. 2.We can also write on fly query function in serving layer to give low latency query result.
  9. 1.Human fault tolerance: a)If bug in batch job:Discard batch view and recompute it. b) If bug in master data then just discard buggy data and re-process on old data.master dataset is immutable and append only dataset so we can easily discard buggy data. c)If bug in query then re-deploy query layer. 2. In lambda Architecture we can use different alogorithm in each layar, like in batch layer we use Exact seach algorthm and in speed layer we can use Approximate seach algo. 3)Under the Lambda Architecture, results from the batch layer are always eventually consistent. As soon as a fresh batch update is completed, results from the batch layer are consistent.
  10. 1.Spark also provides in built common ML algorithms such as classification, regression, clustering, and collaborative filtering. 3.We didn’t need to re-implement the business logic, nor test and maintain a second code base.  4.