SlideShare a Scribd company logo
1 of 67
Download to read offline
A real-time architecture using
     Hadoop and Storm.
Speakers




 Nathan Bijnens                     Geert Van Landeghem
 @nathan_gs                         @gvanlandeghem


                  A real-time architecture using Hadoop & Storm.   2
Our Vision



 Volume
  Big Data


             test   A real-time architecture using Hadoop & Storm.   3
Big Data




   Velocity
              test   A real-time architecture using Hadoop & Storm.   4
Our Vision



 Volume



             test
                                       Variety
                    A real-time architecture using Hadoop & Storm.   5
Credits
Nathan Marz
 Engineer at Backtype
 (now Twitter).
 Storm
 Cascalog
 ElephantDB

                                    manning.com/marz
                        A real-time architecture using Hadoop & Storm.   6
A Data System

     A real-time architecture using Hadoop & Storm.   7
Data is more than Information


           Not all information is equal.
       Some information is derived from other pieces of
                       information.




                        A real-time architecture using Hadoop & Storm.   8
Data is more than Information


    Eventually you will reach the most

    This is the information you hold true, simple because it exists.




                             A real-time architecture using Hadoop & Storm.   9
Events
Everything we do generates events:
-   Pay with Credit Card
-   Commit to Git
-   Click on a webpage
-   Tweet




                           A real-time architecture using Hadoop & Storm.   10
Events - Before



      Events used to manipulate the
               master data.



                  A real-time architecture using Hadoop & Storm.   11
Events - After



       Today, events are the master
                  data.



                  A real-time architecture using Hadoop & Storm.   12
Data System




                   everything.



              A real-time architecture using Hadoop & Storm.   13
Events




         Data is Immutable



              A real-time architecture using Hadoop & Storm.   14
Events




         Data is Time Based



               A real-time architecture using Hadoop & Storm.   15
Capturing change traditionally

Person   Location                     Person                 Location

Nathan   Antwerp                      Nathan                 Ghent

Geert    Dendermonde                  Geert                  Dendermonde

John     Ghent                        John                   Ghent




                       A real-time architecture using Hadoop & Storm.      16
Capturing change

Person   Location      Time                         Person           Location        Time

Nathan   Antwerp       2005-01-01                   Nathan           Antwerp         2005-01-01

Geert    Dendermonde   2011-10-08                   Geert            Dendermonde     2011-10-08

John     Ghent         2010-05-02                   John             Ghent           2010-05-02

                                                    Nathan           Ghent           2013-02-03




                                    A real-time architecture using Hadoop & Storm.                17
Query


 The data you query is often transformed,
             aggregated, ...



                  A real-time architecture using Hadoop & Storm.   18
Query




 Query = function ( data )

           A real-time architecture using Hadoop & Storm.   19
Number of people living in each city.

 Person   Location      Time                         Location               Count

 Nathan   Antwerp       2005-01-01                   Ghent                  2

 Geert    Dendermonde   2011-10-08                   Dendermonde            1

 John     Ghent         2010-05-02

 Nathan   Ghent         2013-02-03




                                     A real-time architecture using Hadoop & Storm.   20
Query



        All Data                                             Query



                   A real-time architecture using Hadoop & Storm.    21
Query: Precompute



     All Data                  Precomputed
                                   View                   Query



                A real-time architecture using Hadoop & Storm.    22
Layered Architecture

                 Batch Layer


                Speed Layer


                Serving Layer

                  A real-time architecture using Hadoop & Storm.   23
Layered Architecture
                                     Cassandra




                                                                   Query
Incoming Data

                    Hadoop
                                            Elephant
                                               DB




                  A real-time architecture using Hadoop & Storm.           24
Batch Layer

    A real-time architecture using Hadoop & Storm.   25
Batch Layer



Incoming Data

                  Hadoop
                                          Elephant
                                             DB




                A real-time architecture using Hadoop & Storm.   26
Batch Layer



        Unrestrained computation.



                 A real-time architecture using Hadoop & Storm.   27
Batch Layer



              Horizontal scalable.



                    A real-time architecture using Hadoop & Storm.   28
Batch Layer



              High Latency.
                  matter.




                 A real-time architecture using Hadoop & Storm.   29
Batch Layer


     Stores master copy of data set...

              append only.


                  A real-time architecture using Hadoop & Storm.   30
Batch Layer




              A real-time architecture using Hadoop & Storm.   31
Batch: View generation

                                                                View #1

    Master Dataset


                                                                View #2
                                       MapReduce



                                                                View #3




                     A real-time architecture using Hadoop & Storm.       32
MapReduce
           1. Take a large problem and divide it into sub-problems

                                                                            …
  MAP


           2. Perform the same function on all sub-problems
                                                                            …
                 DoWork()        DoWork()             DoWork()



           3. Combine the output from all sub-problems
  REDUCE




                                                        …



                                    Output

                                    A real-time architecture using Hadoop & Storm.   33
Batch View Database



          Read only database.
            No random writes required.




                    A real-time architecture using Hadoop & Storm.   34
Batch View Database
ElephantDB
Splout




                 A real-time architecture using Hadoop & Storm.   35
Batch Layer
                                                   Just a few hours of data.




                                                         Not yet
 Data absorbed into Batch Views                         absorbed.


                         Time




                                                                           Now
                     A real-time architecture using Hadoop & Storm.            36
Speed Layer

    A real-time architecture using Hadoop & Storm.   37
Overview
                                   Cassandra




Incoming Data

                  Hadoop
                                          Elephant
                                             DB




                A real-time architecture using Hadoop & Storm.   38
Speed Layer




              Stream processing.



                    A real-time architecture using Hadoop & Storm.   39
Speed Layer




        Continuous computation.



                A real-time architecture using Hadoop & Storm.   40
Speed Layer




              Transactional.



                 A real-time architecture using Hadoop & Storm.   41
Speed Layer



    Storing a limited window of data.
       Compensating for the last few hours of data.




                        A real-time architecture using Hadoop & Storm.   42
Speed Layer


 All the complexity is isolated in the Speed
  layer                               auto-
                corrected.


                   A real-time architecture using Hadoop & Storm.   43
CAP
You have a choice between:
  Availability
 - Queries are eventual consistent.
  Consistency
 - Queries are consistent.




                               A real-time architecture using Hadoop & Storm.   44
Eventual accuracy


Some algorithms are hard to implement in
   real time. For those cases we could
           estimate the results.


                    A real-time architecture using Hadoop & Storm.   45
Speed Layer
                                                                  Real
                                                                  Time
                                                                 View 1



Incoming Data

                                                                  Real
                                                                  Time
                                                                 View 2



                A real-time architecture using Hadoop & Storm.            46
Storm
Message passing.
Distributed processing.
Horizontally scalable.
Incremental algorithms.
Fast.

Data in motion.
                     A real-time architecture using Hadoop & Storm.   47
Storm
Message passing.
Distributed processing.
Horizontally scalable.
Incremental algorithms.
Fast.

Data in motion.
                     A real-time architecture using Hadoop & Storm.   48
Storm

                          Nimbus                                        Zookeeper



        Supervisor                 Supervisor                   Supervisor
        Worker

                 Worker

                          Worker


                                   Worker

                                            Worker




                                                               Worker

                                                                         Worker
                                                     Worker




                                                                                  Worker
    Worker Node                    Worker Node                Worker Node

                                       A real-time architecture using Hadoop & Storm.      49
Storm
Tuple




Stream



         A real-time architecture using Hadoop & Storm.   50
Storm
Spout




Bolt



        A real-time architecture using Hadoop & Storm.   51
Storm
Grouping




           A real-time architecture using Hadoop & Storm.   52
Speed Layer Views
The views are stored in Read & Write database.
-   Cassandra
-   Hbase
-   MongoDB
-   MySQL
-   ElasticSearch
-
Much more complex than a read only view.

                      A real-time architecture using Hadoop & Storm.   53
Serving Layer

     A real-time architecture using Hadoop & Storm.   54
Overview
                                   Cassandra




                                                                 Query
Incoming Data

                  Hadoop
                                          Elephant
                                             DB




                A real-time architecture using Hadoop & Storm.           55
Serving Layer



  This layer queries the Batch & Real Time
             views and merges it.



                   A real-time architecture using Hadoop & Storm.   56
Serving Layer

           Batch
           Views




                                    Merge


            Real
           Time
           Views


                   A real-time architecture using Hadoop & Storm.   57
Overview

  A real-time architecture using Hadoop & Storm.   58
Overview
                                   Cassandra




                                                                 Query
Incoming Data

                  Hadoop
                                          Elephant
                                             DB




                A real-time architecture using Hadoop & Storm.           59
Lambda Architecture
Can discard any view, batch and real time, and just
recreate everything from the master data.
Mistakes are corrected via recomputation.
- Write bad data? Remove the data & recompute.
- Bug in view generation? Just recompute the view.
Data storage is highly optimized.



                             A real-time architecture using Hadoop & Storm.   60
Recommendations

      A real-time architecture using Hadoop & Storm.   61
Serialization & Schema



      Catch errors as quickly as they happen.
               Validation on write vs on read.




                        A real-time architecture using Hadoop & Storm.   62
Serialization & Schema



  CSV is actually a serialization language that is just
                    poorly defined.




                        A real-time architecture using Hadoop & Storm.   63
Serialization & Schema
 Use a format with a schema.
- Thrift
- Avro
- Protobuffers




                      A real-time architecture using Hadoop & Storm.   64
Questions?



             What are your needs?
             @nathan_gs & @gvanlandeghem




                      A real-time architecture using Hadoop & Storm.   65
DataCrunchers

We enable companies in envisioning, defining and
implementing a data strategy.

A one-stop-shop for all your Big Data needs.

The first Big Data Consultancy agency in Belgium.

                        A real-time architecture using Hadoop & Storm.   66
Jobs



         We are hiring.
       jobs@datacrunchers.eu




             A real-time architecture using Hadoop & Storm.   67

More Related Content

What's hot

Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Brian O'Neill
 
Reference architecture for Internet Of Things
Reference architecture for Internet Of ThingsReference architecture for Internet Of Things
Reference architecture for Internet Of Thingselephantscale
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsData Con LA
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFKeith Kraus
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...Neville Li
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systemsnathanmarz
 
Programmatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidProgrammatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidCharles Allen
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an examplehadooparchbook
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUsiguazio
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
GCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native ArchitecturesGCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native Architecturesnine
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.darach
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...StampedeCon
 
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using DruidJuly 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using DruidYahoo Developer Network
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
 
ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016Keith Kraus
 

What's hot (20)

Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
 
Reference architecture for Internet Of Things
Reference architecture for Internet Of ThingsReference architecture for Internet Of Things
Reference architecture for Internet Of Things
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
 
Programmatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidProgrammatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & Druid
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
GCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native ArchitecturesGCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native Architectures
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
 
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using DruidJuly 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016
 

Similar to A real time architecture using Hadoop and Storm @ FOSDEM 2013

Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Cedric CARBONE
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Stormboorad
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...HostedbyConfluent
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...HostedbyConfluent
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Stormboorad
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introductionchristian.perez
 
Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18Ashley Brown
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudKhazret Sapenov
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningTed Dunning
 
Parallel SPAM Clustering with Hadoop
Parallel SPAM Clustering with HadoopParallel SPAM Clustering with Hadoop
Parallel SPAM Clustering with HadoopThibault Debatty
 
Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Adam Gibson
 
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...Nagios
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
Lambda kappa architecture - the jury are still out
Lambda   kappa architecture - the jury are still outLambda   kappa architecture - the jury are still out
Lambda kappa architecture - the jury are still outYoav chernobroda
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopHazelcast
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchDirk Petersen
 

Similar to A real time architecture using Hadoop and Storm @ FOSDEM 2013 (20)

Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Storm
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Storm
 
London hug
London hugLondon hug
London hug
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introduction
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
Big Trends in Big Data
Big Trends in Big DataBig Trends in Big Data
Big Trends in Big Data
 
Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
Parallel SPAM Clustering with Hadoop
Parallel SPAM Clustering with HadoopParallel SPAM Clustering with Hadoop
Parallel SPAM Clustering with Hadoop
 
Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014
 
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Lambda kappa architecture - the jury are still out
Lambda   kappa architecture - the jury are still outLambda   kappa architecture - the jury are still out
Lambda kappa architecture - the jury are still out
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
 

More from Nathan Bijnens

Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricNathan Bijnens
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in ProductionNathan Bijnens
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Nathan Bijnens
 
Big Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AIBig Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AINathan Bijnens
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Nathan Bijnens
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Nathan Bijnens
 
Microsoft AI at SAI '17
Microsoft AI at SAI '17Microsoft AI at SAI '17
Microsoft AI at SAI '17Nathan Bijnens
 
Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Nathan Bijnens
 
A real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beA real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beNathan Bijnens
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big dataNathan Bijnens
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Nathan Bijnens
 

More from Nathan Bijnens (12)

Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in Production
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
 
Big Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AIBig Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AI
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17
 
Microsoft AI at SAI '17
Microsoft AI at SAI '17Microsoft AI at SAI '17
Microsoft AI at SAI '17
 
Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16
 
A real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beA real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.be
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big data
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 

Recently uploaded

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 

Recently uploaded (20)

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 

A real time architecture using Hadoop and Storm @ FOSDEM 2013

  • 1. A real-time architecture using Hadoop and Storm.
  • 2. Speakers Nathan Bijnens Geert Van Landeghem @nathan_gs @gvanlandeghem A real-time architecture using Hadoop & Storm. 2
  • 3. Our Vision Volume Big Data test A real-time architecture using Hadoop & Storm. 3
  • 4. Big Data Velocity test A real-time architecture using Hadoop & Storm. 4
  • 5. Our Vision Volume test Variety A real-time architecture using Hadoop & Storm. 5
  • 6. Credits Nathan Marz Engineer at Backtype (now Twitter). Storm Cascalog ElephantDB manning.com/marz A real-time architecture using Hadoop & Storm. 6
  • 7. A Data System A real-time architecture using Hadoop & Storm. 7
  • 8. Data is more than Information Not all information is equal. Some information is derived from other pieces of information. A real-time architecture using Hadoop & Storm. 8
  • 9. Data is more than Information Eventually you will reach the most This is the information you hold true, simple because it exists. A real-time architecture using Hadoop & Storm. 9
  • 10. Events Everything we do generates events: - Pay with Credit Card - Commit to Git - Click on a webpage - Tweet A real-time architecture using Hadoop & Storm. 10
  • 11. Events - Before Events used to manipulate the master data. A real-time architecture using Hadoop & Storm. 11
  • 12. Events - After Today, events are the master data. A real-time architecture using Hadoop & Storm. 12
  • 13. Data System everything. A real-time architecture using Hadoop & Storm. 13
  • 14. Events Data is Immutable A real-time architecture using Hadoop & Storm. 14
  • 15. Events Data is Time Based A real-time architecture using Hadoop & Storm. 15
  • 16. Capturing change traditionally Person Location Person Location Nathan Antwerp Nathan Ghent Geert Dendermonde Geert Dendermonde John Ghent John Ghent A real-time architecture using Hadoop & Storm. 16
  • 17. Capturing change Person Location Time Person Location Time Nathan Antwerp 2005-01-01 Nathan Antwerp 2005-01-01 Geert Dendermonde 2011-10-08 Geert Dendermonde 2011-10-08 John Ghent 2010-05-02 John Ghent 2010-05-02 Nathan Ghent 2013-02-03 A real-time architecture using Hadoop & Storm. 17
  • 18. Query The data you query is often transformed, aggregated, ... A real-time architecture using Hadoop & Storm. 18
  • 19. Query Query = function ( data ) A real-time architecture using Hadoop & Storm. 19
  • 20. Number of people living in each city. Person Location Time Location Count Nathan Antwerp 2005-01-01 Ghent 2 Geert Dendermonde 2011-10-08 Dendermonde 1 John Ghent 2010-05-02 Nathan Ghent 2013-02-03 A real-time architecture using Hadoop & Storm. 20
  • 21. Query All Data Query A real-time architecture using Hadoop & Storm. 21
  • 22. Query: Precompute All Data Precomputed View Query A real-time architecture using Hadoop & Storm. 22
  • 23. Layered Architecture Batch Layer Speed Layer Serving Layer A real-time architecture using Hadoop & Storm. 23
  • 24. Layered Architecture Cassandra Query Incoming Data Hadoop Elephant DB A real-time architecture using Hadoop & Storm. 24
  • 25. Batch Layer A real-time architecture using Hadoop & Storm. 25
  • 26. Batch Layer Incoming Data Hadoop Elephant DB A real-time architecture using Hadoop & Storm. 26
  • 27. Batch Layer Unrestrained computation. A real-time architecture using Hadoop & Storm. 27
  • 28. Batch Layer Horizontal scalable. A real-time architecture using Hadoop & Storm. 28
  • 29. Batch Layer High Latency. matter. A real-time architecture using Hadoop & Storm. 29
  • 30. Batch Layer Stores master copy of data set... append only. A real-time architecture using Hadoop & Storm. 30
  • 31. Batch Layer A real-time architecture using Hadoop & Storm. 31
  • 32. Batch: View generation View #1 Master Dataset View #2 MapReduce View #3 A real-time architecture using Hadoop & Storm. 32
  • 33. MapReduce 1. Take a large problem and divide it into sub-problems … MAP 2. Perform the same function on all sub-problems … DoWork() DoWork() DoWork() 3. Combine the output from all sub-problems REDUCE … Output A real-time architecture using Hadoop & Storm. 33
  • 34. Batch View Database Read only database. No random writes required. A real-time architecture using Hadoop & Storm. 34
  • 35. Batch View Database ElephantDB Splout A real-time architecture using Hadoop & Storm. 35
  • 36. Batch Layer Just a few hours of data. Not yet Data absorbed into Batch Views absorbed. Time Now A real-time architecture using Hadoop & Storm. 36
  • 37. Speed Layer A real-time architecture using Hadoop & Storm. 37
  • 38. Overview Cassandra Incoming Data Hadoop Elephant DB A real-time architecture using Hadoop & Storm. 38
  • 39. Speed Layer Stream processing. A real-time architecture using Hadoop & Storm. 39
  • 40. Speed Layer Continuous computation. A real-time architecture using Hadoop & Storm. 40
  • 41. Speed Layer Transactional. A real-time architecture using Hadoop & Storm. 41
  • 42. Speed Layer Storing a limited window of data. Compensating for the last few hours of data. A real-time architecture using Hadoop & Storm. 42
  • 43. Speed Layer All the complexity is isolated in the Speed layer auto- corrected. A real-time architecture using Hadoop & Storm. 43
  • 44. CAP You have a choice between: Availability - Queries are eventual consistent. Consistency - Queries are consistent. A real-time architecture using Hadoop & Storm. 44
  • 45. Eventual accuracy Some algorithms are hard to implement in real time. For those cases we could estimate the results. A real-time architecture using Hadoop & Storm. 45
  • 46. Speed Layer Real Time View 1 Incoming Data Real Time View 2 A real-time architecture using Hadoop & Storm. 46
  • 47. Storm Message passing. Distributed processing. Horizontally scalable. Incremental algorithms. Fast. Data in motion. A real-time architecture using Hadoop & Storm. 47
  • 48. Storm Message passing. Distributed processing. Horizontally scalable. Incremental algorithms. Fast. Data in motion. A real-time architecture using Hadoop & Storm. 48
  • 49. Storm Nimbus Zookeeper Supervisor Supervisor Supervisor Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Node Worker Node Worker Node A real-time architecture using Hadoop & Storm. 49
  • 50. Storm Tuple Stream A real-time architecture using Hadoop & Storm. 50
  • 51. Storm Spout Bolt A real-time architecture using Hadoop & Storm. 51
  • 52. Storm Grouping A real-time architecture using Hadoop & Storm. 52
  • 53. Speed Layer Views The views are stored in Read & Write database. - Cassandra - Hbase - MongoDB - MySQL - ElasticSearch - Much more complex than a read only view. A real-time architecture using Hadoop & Storm. 53
  • 54. Serving Layer A real-time architecture using Hadoop & Storm. 54
  • 55. Overview Cassandra Query Incoming Data Hadoop Elephant DB A real-time architecture using Hadoop & Storm. 55
  • 56. Serving Layer This layer queries the Batch & Real Time views and merges it. A real-time architecture using Hadoop & Storm. 56
  • 57. Serving Layer Batch Views Merge Real Time Views A real-time architecture using Hadoop & Storm. 57
  • 58. Overview A real-time architecture using Hadoop & Storm. 58
  • 59. Overview Cassandra Query Incoming Data Hadoop Elephant DB A real-time architecture using Hadoop & Storm. 59
  • 60. Lambda Architecture Can discard any view, batch and real time, and just recreate everything from the master data. Mistakes are corrected via recomputation. - Write bad data? Remove the data & recompute. - Bug in view generation? Just recompute the view. Data storage is highly optimized. A real-time architecture using Hadoop & Storm. 60
  • 61. Recommendations A real-time architecture using Hadoop & Storm. 61
  • 62. Serialization & Schema Catch errors as quickly as they happen. Validation on write vs on read. A real-time architecture using Hadoop & Storm. 62
  • 63. Serialization & Schema CSV is actually a serialization language that is just poorly defined. A real-time architecture using Hadoop & Storm. 63
  • 64. Serialization & Schema Use a format with a schema. - Thrift - Avro - Protobuffers A real-time architecture using Hadoop & Storm. 64
  • 65. Questions? What are your needs? @nathan_gs & @gvanlandeghem A real-time architecture using Hadoop & Storm. 65
  • 66. DataCrunchers We enable companies in envisioning, defining and implementing a data strategy. A one-stop-shop for all your Big Data needs. The first Big Data Consultancy agency in Belgium. A real-time architecture using Hadoop & Storm. 66
  • 67. Jobs We are hiring. jobs@datacrunchers.eu A real-time architecture using Hadoop & Storm. 67