Lambda usecase

•Download as PPTX, PDF•

5 likes•929 views

The Lambda architecture uses a batch layer to process all incoming data and generate batch views to serve queries with high latency, a speed layer to process recent data and compensate for batch view latency with low latency real-time views, and a serving layer to merge batch and real-time views to answer queries. This document provides an example use case where RabbitMQ is used for data injection, Apache Spark is used for batch processing, Apache Spark Streaming is used for the speed layer, Apache Shark is used in the serving layer, and results are stored in Cassandra and presented using Tomcat and D3.

Software

Introduction - Lambda Architecture
• Lambda Architecture (introduced by Nathan Marz) is a
generic, scalable and fault-tolerant data processing
architecture to satisfy the needs for a robust system
that is
– Fault-tolerant, both against hardware failures and human
mistakes. Mistakes are corrected via recomputation
– Being able to serve a wide range of workloads and use
cases, and in which low-latency reads and updates are
required.
– Data storage is history optimized and immutability changes
everything.
– The resulting system should be linearly scalable, and it
should scale out rather than up.

LA High-level perspective ( continue)
• All data entering the system is dispatched to both the
batch layer and the speed layer for processing.
• The batch layer has two functions: (i) managing the
master dataset (an immutable, append-only set of raw
data), and (ii) to pre-compute the batch views.
• The serving layer indexes the batch views so that they
can be queried in low-latency, ad-hoc way.
• The speed layer compensates for the high latency of
updates to the serving layer and deals with recent data
only.
• Any incoming query can be answered by merging
results from batch views and real-time views.

Lambda use case
• Data Injection – Queue & Pub/Sub models are
nature fit. RabbitMQ is used
• Use Apache Spark in Batch Layer and Jenkins for
scheduler
• Use Apache Spark Streaming in Speed Layer. Use
Cassandra to store the real time results
• Adopt Apache Shark in Serving Layer
• In Presentation layer, use Tomcat and D3
• ( Refer to next slide for the diagram )

RabbitMQ
HDFS
Apache Spark
Jenkins
Cassandra
Apache Shark Tomcat
RabbitMQ listener
on Tomcat
Realtime processors
( Spark Streaming)
DataInjection
D3
Speed Layer
Batch Layer
ServingLayer
HDFS Loader
StreamingAdaptor
Hive

Apache Spark
• Hadoop integration
• Spark interactive Shell
• The Spark Analytic Suite includes
– Interactive query analysis (Shark),
– Large-scale graph processing and analysis (Bagel)
– Real-time analysis (Spark Streaming).
– Machine Learning library
• Resilient Distributed Data sets
– Distributed objects that can be cached in-memory, across a cluster of compute
nodes
– Fault-tolerance is built-in: RDD’s are automatically rebuilt if something goes
wrong
• Distributed Operators
• Spark is already used in production
• The Spark codebase is small and extensible

Apache Shark
Shark is a component of Spark, an open source, distributed and fault-
tolerant, in-memory analytics system, that can be installed on the
same cluster as Hadoop.
In particular, Shark is fully compatible with Hive and
supports HiveQL, Hive data formats, and user-defined functions. In
addition Shark can be used to query data4 in HDFS, HBase, and
Amazon S3
• Interactive SQL systems for Hadoop
• In-memory column store and column compression
• Control over data partitioning => Fast, distributed JOINS
• Fault-tolerance
• SQL “optimizer”
• Machine-learning support

References
• http://lambda-
architecture.net/components/2013-12-12-batch-
components/
• http://lambda-
architecture.net/components/2013-12-24-speed-
components/
• http://lambda-
architecture.net/architecture/2013-12-24-where-
pp-meets-la/
• http://manning.com/marz/BDmeapch1.pdf
• https://www.youtube.com/watch?v=ucHjyb6jv08
• http://www.drdobbs.com/database/applying-
the-big-data-lambda-architectur/240162604

What's hot

An Overview of Apache SparkYasoda Jayaweera

Introduction to apache sparkUserReport

Spark Summit EU talk by Bas GeerdinkSpark Summit

Exponea - Kafka and Hadoop as components of architectureMartinStrycek

Webinar: DataStax Training - Everything you need to become a Cassandra RockstarDataStax

Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson

Reactive streamscodepitbull

Spark CoreTodd McGrath

Big Data visualization with Apache Spark and Zeppelinprajods

Kudu austin oct 2015.pptxFelicia Haggarty

Apache Spark BriefingThomas W. Dinsmore

Low latency high throughput streaming using Apache Apex and Apache KuduDataWorks Summit

Spark Summit EU talk by Berni SchieferSpark Summit

Spark Summit EU talk by Mike PercySpark Summit

Interactive Visualization of Streaming Data Powered by SparkSpark Summit

Spark Summit EU talk by Oscar CastanedaSpark Summit

Integrating Apache Phoenix with Distributed Query EnginesDataWorks Summit

Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit

Big data storesKumaran Ramanujam

Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit

What's hot (20)

An Overview of Apache Spark

Introduction to apache spark

Spark Summit EU talk by Bas Geerdink

Exponea - Kafka and Hadoop as components of architecture

Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar

Streaming Analytics with Spark, Kafka, Cassandra and Akka

Reactive streams

Spark Core

Big Data visualization with Apache Spark and Zeppelin

Kudu austin oct 2015.pptx

Apache Spark Briefing

Low latency high throughput streaming using Apache Apex and Apache Kudu

Spark Summit EU talk by Berni Schiefer

Spark Summit EU talk by Mike Percy

Interactive Visualization of Streaming Data Powered by Spark

Spark Summit EU talk by Oscar Castaneda

Integrating Apache Phoenix with Distributed Query Engines

Spark Summit EU talk by Ruben Pulido Behar Veliqi

Big data stores

Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson

Similar to Lambda usecase

Big Data_Architecture.pptxbetalab

APACHE SPARK.pptxDeepaThirumurugan

Using Hazelcast in the Kappa architectureOliver Buckley-Salmon

Lambda Architecture with SparkKnoldus Inc.

Data streaming fundamentalsMohammed Fazuluddin

Apache Spark CoreGirish Khanzode

2015 01-17 Lambda Architecture with Apache Spark, NextML ConferenceDB Tsai

Glint with Apache SparkVenkata Naga Ravi

Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi

Apache Spark - A High Level overviewKaran Alang

Apache SparkSugumarSarDurai

In Memory Analytics with Apache SparkVenkata Naga Ravi

xPatterns - Spark Summit 2014Claudiu Barbura

Apache sparkHitesh Dua

Apache hadoop technology : BeginnersShweta Patnaik

Hadoop introductionDong Ngoc

Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Chris Fregly

Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime

Similar to Lambda usecase (20)

Big Data_Architecture.pptx

APACHE SPARK.pptx

Using Hazelcast in the Kappa architecture

Lambda Architecture with Spark

Data streaming fundamentals

Apache Spark Core

2015 01-17 Lambda Architecture with Apache Spark, NextML Conference

Glint with Apache Spark

Processing Large Data with Apache Spark -- HasGeek

Apache Spark - A High Level overview

Apache Spark

In Memory Analytics with Apache Spark

xPatterns - Spark Summit 2014

Apache spark

Apache hadoop technology : Beginners

Hadoop introduction

Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...

Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014

Recently uploaded

WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba

%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan

WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2

%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba

WSO2CON 2024 - Does Open Source Still Matter?WSO2

tonesoftglanshi9

What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health

%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba

WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba

WSO2CON2024 - It's time to go PlatformlessWSO2

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells

WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2

Announcing Codolex 2.0 from GDK SoftwareJim McKeeth

Recently uploaded (20)

WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...

%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...

WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...

%in kempton park+277-882-255-28 abortion pills for sale in kempton park

WSO2CON 2024 - Does Open Source Still Matter?

tonesoftg

What Goes Wrong with Language Definitions and How to Improve the Situation

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

%in ivory park+277-882-255-28 abortion pills for sale in ivory park

WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...

WSO2CON2024 - It's time to go Platformless

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...

WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...

Announcing Codolex 2.0 from GDK Software

Lambda usecase

1. Lambda architecture use case 6/27/2014

2. Introduction - Lambda Architecture • Lambda Architecture (introduced by Nathan Marz) is a generic, scalable and fault-tolerant data processing architecture to satisfy the needs for a robust system that is – Fault-tolerant, both against hardware failures and human mistakes. Mistakes are corrected via recomputation – Being able to serve a wide range of workloads and use cases, and in which low-latency reads and updates are required. – Data storage is history optimized and immutability changes everything. – The resulting system should be linearly scalable, and it should scale out rather than up.

3. LA High-level perspective

4. LA High-level perspective ( continue) • All data entering the system is dispatched to both the batch layer and the speed layer for processing. • The batch layer has two functions: (i) managing the master dataset (an immutable, append-only set of raw data), and (ii) to pre-compute the batch views. • The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way. • The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only. • Any incoming query can be answered by merging results from batch views and real-time views.

5. Lambda use case • Data Injection – Queue & Pub/Sub models are nature fit. RabbitMQ is used • Use Apache Spark in Batch Layer and Jenkins for scheduler • Use Apache Spark Streaming in Speed Layer. Use Cassandra to store the real time results • Adopt Apache Shark in Serving Layer • In Presentation layer, use Tomcat and D3 • ( Refer to next slide for the diagram )

6. RabbitMQ HDFS Apache Spark Jenkins Cassandra Apache Shark Tomcat RabbitMQ listener on Tomcat Realtime processors ( Spark Streaming) DataInjection D3 Speed Layer Batch Layer ServingLayer HDFS Loader StreamingAdaptor Hive

7. Apache Spark • Hadoop integration • Spark interactive Shell • The Spark Analytic Suite includes – Interactive query analysis (Shark), – Large-scale graph processing and analysis (Bagel) – Real-time analysis (Spark Streaming). – Machine Learning library • Resilient Distributed Data sets – Distributed objects that can be cached in-memory, across a cluster of compute nodes – Fault-tolerance is built-in: RDD’s are automatically rebuilt if something goes wrong • Distributed Operators • Spark is already used in production • The Spark codebase is small and extensible

8. Apache Shark Shark is a component of Spark, an open source, distributed and fault- tolerant, in-memory analytics system, that can be installed on the same cluster as Hadoop. In particular, Shark is fully compatible with Hive and supports HiveQL, Hive data formats, and user-defined functions. In addition Shark can be used to query data4 in HDFS, HBase, and Amazon S3 • Interactive SQL systems for Hadoop • In-memory column store and column compression • Control over data partitioning => Fast, distributed JOINS • Fault-tolerance • SQL “optimizer” • Machine-learning support

10. References • http://lambda- architecture.net/components/2013-12-12-batch- components/ • http://lambda- architecture.net/components/2013-12-24-speed- components/ • http://lambda- architecture.net/architecture/2013-12-24-where- pp-meets-la/ • http://manning.com/marz/BDmeapch1.pdf • https://www.youtube.com/watch?v=ucHjyb6jv08 • http://www.drdobbs.com/database/applying- the-big-data-lambda-architectur/240162604

Lambda usecase

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Lambda usecase

Similar to Lambda usecase (20)

Recently uploaded

Recently uploaded (20)

Lambda usecase