SlideShare a Scribd company logo

A real time architecture using Hadoop and Storm @ FOSDEM 2013

1 of 67
Download to read offline
A real-time architecture using
     Hadoop and Storm.
Speakers




 Nathan Bijnens                     Geert Van Landeghem
 @nathan_gs                         @gvanlandeghem


                  A real-time architecture using Hadoop & Storm.   2
Our Vision



 Volume
  Big Data


             test   A real-time architecture using Hadoop & Storm.   3
Big Data




   Velocity
              test   A real-time architecture using Hadoop & Storm.   4
Our Vision



 Volume



             test
                                       Variety
                    A real-time architecture using Hadoop & Storm.   5
Credits
Nathan Marz
 Engineer at Backtype
 (now Twitter).
 Storm
 Cascalog
 ElephantDB

                                    manning.com/marz
                        A real-time architecture using Hadoop & Storm.   6

Recommended

Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Nathan Bijnens
 
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Nathan Bijnens
 
a real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxxa real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at DevoxxNathan Bijnens
 
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonNathan Bijnens
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...Nathan Bijnens
 
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...jaxLondonConference
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014StampedeCon
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summitSujee Maniyam
 

More Related Content

What's hot

Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Brian O'Neill
 
Reference architecture for Internet Of Things
Reference architecture for Internet Of ThingsReference architecture for Internet Of Things
Reference architecture for Internet Of Thingselephantscale
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsData Con LA
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFKeith Kraus
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...Neville Li
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systemsnathanmarz
 
Programmatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidProgrammatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidCharles Allen
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an examplehadooparchbook
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUsiguazio
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
GCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native ArchitecturesGCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native Architecturesnine
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.darach
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...StampedeCon
 
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using DruidJuly 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using DruidYahoo Developer Network
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
 
ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016Keith Kraus
 

What's hot (20)

Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
 
Reference architecture for Internet Of Things
Reference architecture for Internet Of ThingsReference architecture for Internet Of Things
Reference architecture for Internet Of Things
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
 
Programmatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidProgrammatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & Druid
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
GCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native ArchitecturesGCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native Architectures
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
 
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using DruidJuly 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016
 

Similar to A real time architecture using Hadoop and Storm @ FOSDEM 2013

Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Cedric CARBONE
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Stormboorad
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...HostedbyConfluent
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...HostedbyConfluent
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Stormboorad
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introductionchristian.perez
 
Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18Ashley Brown
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudKhazret Sapenov
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningTed Dunning
 
Parallel SPAM Clustering with Hadoop
Parallel SPAM Clustering with HadoopParallel SPAM Clustering with Hadoop
Parallel SPAM Clustering with HadoopThibault Debatty
 
Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Adam Gibson
 
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...Nagios
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
Lambda kappa architecture - the jury are still out
Lambda   kappa architecture - the jury are still outLambda   kappa architecture - the jury are still out
Lambda kappa architecture - the jury are still outYoav chernobroda
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopHazelcast
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchDirk Petersen
 

Similar to A real time architecture using Hadoop and Storm @ FOSDEM 2013 (20)

Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Storm
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Storm
 
London hug
London hugLondon hug
London hug
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introduction
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
Big Trends in Big Data
Big Trends in Big DataBig Trends in Big Data
Big Trends in Big Data
 
Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
Parallel SPAM Clustering with Hadoop
Parallel SPAM Clustering with HadoopParallel SPAM Clustering with Hadoop
Parallel SPAM Clustering with Hadoop
 
Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014
 
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Lambda kappa architecture - the jury are still out
Lambda   kappa architecture - the jury are still outLambda   kappa architecture - the jury are still out
Lambda kappa architecture - the jury are still out
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
 

More from Nathan Bijnens

Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricNathan Bijnens
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in ProductionNathan Bijnens
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Nathan Bijnens
 
Big Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AIBig Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AINathan Bijnens
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Nathan Bijnens
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Nathan Bijnens
 
Microsoft AI at SAI '17
Microsoft AI at SAI '17Microsoft AI at SAI '17
Microsoft AI at SAI '17Nathan Bijnens
 
Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Nathan Bijnens
 
A real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beA real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beNathan Bijnens
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big dataNathan Bijnens
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Nathan Bijnens
 

More from Nathan Bijnens (12)

Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in Production
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
 
Big Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AIBig Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AI
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17
 
Microsoft AI at SAI '17
Microsoft AI at SAI '17Microsoft AI at SAI '17
Microsoft AI at SAI '17
 
Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16
 
A real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beA real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.be
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big data
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 

Recently uploaded

Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Umar Saif
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Product School
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...htrindia
 
Q4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdfQ4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdfTejal81
 
Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Product School
 
CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlue
CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlueCloudStack Authentication Methods – Harikrishna Patnala, ShapeBlue
CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlueShapeBlue
 
Roundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfRoundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfMostafa Higazy
 
Transcript: Trending now: Book subjects on the move in the Canadian market - ...
Transcript: Trending now: Book subjects on the move in the Canadian market - ...Transcript: Trending now: Book subjects on the move in the Canadian market - ...
Transcript: Trending now: Book subjects on the move in the Canadian market - ...BookNet Canada
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxNeo4j
 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxNeo4j
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, TripadvisorProduct School
 
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Jay Zhao
 
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientAI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientKari Kakkonen
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1Inbay UK
 
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...
Building Bridges:  Merging RPA Processes, UiPath Apps, and Data Service to bu...Building Bridges:  Merging RPA Processes, UiPath Apps, and Data Service to bu...
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...DianaGray10
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Product School
 
Trending now: Book subjects on the move in the Canadian market - Tech Forum 2024
Trending now: Book subjects on the move in the Canadian market - Tech Forum 2024Trending now: Book subjects on the move in the Canadian market - Tech Forum 2024
Trending now: Book subjects on the move in the Canadian market - Tech Forum 2024BookNet Canada
 
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Product School
 
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...ShapeBlue
 

Recently uploaded (20)

Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
 
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
 
Q4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdfQ4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdf
 
Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...
 
CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlue
CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlueCloudStack Authentication Methods – Harikrishna Patnala, ShapeBlue
CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlue
 
Roundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfRoundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdf
 
Transcript: Trending now: Book subjects on the move in the Canadian market - ...
Transcript: Trending now: Book subjects on the move in the Canadian market - ...Transcript: Trending now: Book subjects on the move in the Canadian market - ...
Transcript: Trending now: Book subjects on the move in the Canadian market - ...
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
 
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
 
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientAI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficient
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1
 
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...
Building Bridges:  Merging RPA Processes, UiPath Apps, and Data Service to bu...Building Bridges:  Merging RPA Processes, UiPath Apps, and Data Service to bu...
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
 
Trending now: Book subjects on the move in the Canadian market - Tech Forum 2024
Trending now: Book subjects on the move in the Canadian market - Tech Forum 2024Trending now: Book subjects on the move in the Canadian market - Tech Forum 2024
Trending now: Book subjects on the move in the Canadian market - Tech Forum 2024
 
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
 
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
 

A real time architecture using Hadoop and Storm @ FOSDEM 2013

  • 1. A real-time architecture using Hadoop and Storm.
  • 2. Speakers Nathan Bijnens Geert Van Landeghem @nathan_gs @gvanlandeghem A real-time architecture using Hadoop & Storm. 2
  • 3. Our Vision Volume Big Data test A real-time architecture using Hadoop & Storm. 3
  • 4. Big Data Velocity test A real-time architecture using Hadoop & Storm. 4
  • 5. Our Vision Volume test Variety A real-time architecture using Hadoop & Storm. 5
  • 6. Credits Nathan Marz Engineer at Backtype (now Twitter). Storm Cascalog ElephantDB manning.com/marz A real-time architecture using Hadoop & Storm. 6
  • 7. A Data System A real-time architecture using Hadoop & Storm. 7
  • 8. Data is more than Information Not all information is equal. Some information is derived from other pieces of information. A real-time architecture using Hadoop & Storm. 8
  • 9. Data is more than Information Eventually you will reach the most This is the information you hold true, simple because it exists. A real-time architecture using Hadoop & Storm. 9
  • 10. Events Everything we do generates events: - Pay with Credit Card - Commit to Git - Click on a webpage - Tweet A real-time architecture using Hadoop & Storm. 10
  • 11. Events - Before Events used to manipulate the master data. A real-time architecture using Hadoop & Storm. 11
  • 12. Events - After Today, events are the master data. A real-time architecture using Hadoop & Storm. 12
  • 13. Data System everything. A real-time architecture using Hadoop & Storm. 13
  • 14. Events Data is Immutable A real-time architecture using Hadoop & Storm. 14
  • 15. Events Data is Time Based A real-time architecture using Hadoop & Storm. 15
  • 16. Capturing change traditionally Person Location Person Location Nathan Antwerp Nathan Ghent Geert Dendermonde Geert Dendermonde John Ghent John Ghent A real-time architecture using Hadoop & Storm. 16
  • 17. Capturing change Person Location Time Person Location Time Nathan Antwerp 2005-01-01 Nathan Antwerp 2005-01-01 Geert Dendermonde 2011-10-08 Geert Dendermonde 2011-10-08 John Ghent 2010-05-02 John Ghent 2010-05-02 Nathan Ghent 2013-02-03 A real-time architecture using Hadoop & Storm. 17
  • 18. Query The data you query is often transformed, aggregated, ... A real-time architecture using Hadoop & Storm. 18
  • 19. Query Query = function ( data ) A real-time architecture using Hadoop & Storm. 19
  • 20. Number of people living in each city. Person Location Time Location Count Nathan Antwerp 2005-01-01 Ghent 2 Geert Dendermonde 2011-10-08 Dendermonde 1 John Ghent 2010-05-02 Nathan Ghent 2013-02-03 A real-time architecture using Hadoop & Storm. 20
  • 21. Query All Data Query A real-time architecture using Hadoop & Storm. 21
  • 22. Query: Precompute All Data Precomputed View Query A real-time architecture using Hadoop & Storm. 22
  • 23. Layered Architecture Batch Layer Speed Layer Serving Layer A real-time architecture using Hadoop & Storm. 23
  • 24. Layered Architecture Cassandra Query Incoming Data Hadoop Elephant DB A real-time architecture using Hadoop & Storm. 24
  • 25. Batch Layer A real-time architecture using Hadoop & Storm. 25
  • 26. Batch Layer Incoming Data Hadoop Elephant DB A real-time architecture using Hadoop & Storm. 26
  • 27. Batch Layer Unrestrained computation. A real-time architecture using Hadoop & Storm. 27
  • 28. Batch Layer Horizontal scalable. A real-time architecture using Hadoop & Storm. 28
  • 29. Batch Layer High Latency. matter. A real-time architecture using Hadoop & Storm. 29
  • 30. Batch Layer Stores master copy of data set... append only. A real-time architecture using Hadoop & Storm. 30
  • 31. Batch Layer A real-time architecture using Hadoop & Storm. 31
  • 32. Batch: View generation View #1 Master Dataset View #2 MapReduce View #3 A real-time architecture using Hadoop & Storm. 32
  • 33. MapReduce 1. Take a large problem and divide it into sub-problems … MAP 2. Perform the same function on all sub-problems … DoWork() DoWork() DoWork() 3. Combine the output from all sub-problems REDUCE … Output A real-time architecture using Hadoop & Storm. 33
  • 34. Batch View Database Read only database. No random writes required. A real-time architecture using Hadoop & Storm. 34
  • 35. Batch View Database ElephantDB Splout A real-time architecture using Hadoop & Storm. 35
  • 36. Batch Layer Just a few hours of data. Not yet Data absorbed into Batch Views absorbed. Time Now A real-time architecture using Hadoop & Storm. 36
  • 37. Speed Layer A real-time architecture using Hadoop & Storm. 37
  • 38. Overview Cassandra Incoming Data Hadoop Elephant DB A real-time architecture using Hadoop & Storm. 38
  • 39. Speed Layer Stream processing. A real-time architecture using Hadoop & Storm. 39
  • 40. Speed Layer Continuous computation. A real-time architecture using Hadoop & Storm. 40
  • 41. Speed Layer Transactional. A real-time architecture using Hadoop & Storm. 41
  • 42. Speed Layer Storing a limited window of data. Compensating for the last few hours of data. A real-time architecture using Hadoop & Storm. 42
  • 43. Speed Layer All the complexity is isolated in the Speed layer auto- corrected. A real-time architecture using Hadoop & Storm. 43
  • 44. CAP You have a choice between: Availability - Queries are eventual consistent. Consistency - Queries are consistent. A real-time architecture using Hadoop & Storm. 44
  • 45. Eventual accuracy Some algorithms are hard to implement in real time. For those cases we could estimate the results. A real-time architecture using Hadoop & Storm. 45
  • 46. Speed Layer Real Time View 1 Incoming Data Real Time View 2 A real-time architecture using Hadoop & Storm. 46
  • 47. Storm Message passing. Distributed processing. Horizontally scalable. Incremental algorithms. Fast. Data in motion. A real-time architecture using Hadoop & Storm. 47
  • 48. Storm Message passing. Distributed processing. Horizontally scalable. Incremental algorithms. Fast. Data in motion. A real-time architecture using Hadoop & Storm. 48
  • 49. Storm Nimbus Zookeeper Supervisor Supervisor Supervisor Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Node Worker Node Worker Node A real-time architecture using Hadoop & Storm. 49
  • 50. Storm Tuple Stream A real-time architecture using Hadoop & Storm. 50
  • 51. Storm Spout Bolt A real-time architecture using Hadoop & Storm. 51
  • 52. Storm Grouping A real-time architecture using Hadoop & Storm. 52
  • 53. Speed Layer Views The views are stored in Read & Write database. - Cassandra - Hbase - MongoDB - MySQL - ElasticSearch - Much more complex than a read only view. A real-time architecture using Hadoop & Storm. 53
  • 54. Serving Layer A real-time architecture using Hadoop & Storm. 54
  • 55. Overview Cassandra Query Incoming Data Hadoop Elephant DB A real-time architecture using Hadoop & Storm. 55
  • 56. Serving Layer This layer queries the Batch & Real Time views and merges it. A real-time architecture using Hadoop & Storm. 56
  • 57. Serving Layer Batch Views Merge Real Time Views A real-time architecture using Hadoop & Storm. 57
  • 58. Overview A real-time architecture using Hadoop & Storm. 58
  • 59. Overview Cassandra Query Incoming Data Hadoop Elephant DB A real-time architecture using Hadoop & Storm. 59
  • 60. Lambda Architecture Can discard any view, batch and real time, and just recreate everything from the master data. Mistakes are corrected via recomputation. - Write bad data? Remove the data & recompute. - Bug in view generation? Just recompute the view. Data storage is highly optimized. A real-time architecture using Hadoop & Storm. 60
  • 61. Recommendations A real-time architecture using Hadoop & Storm. 61
  • 62. Serialization & Schema Catch errors as quickly as they happen. Validation on write vs on read. A real-time architecture using Hadoop & Storm. 62
  • 63. Serialization & Schema CSV is actually a serialization language that is just poorly defined. A real-time architecture using Hadoop & Storm. 63
  • 64. Serialization & Schema Use a format with a schema. - Thrift - Avro - Protobuffers A real-time architecture using Hadoop & Storm. 64
  • 65. Questions? What are your needs? @nathan_gs & @gvanlandeghem A real-time architecture using Hadoop & Storm. 65
  • 66. DataCrunchers We enable companies in envisioning, defining and implementing a data strategy. A one-stop-shop for all your Big Data needs. The first Big Data Consultancy agency in Belgium. A real-time architecture using Hadoop & Storm. 66
  • 67. Jobs We are hiring. jobs@datacrunchers.eu A real-time architecture using Hadoop & Storm. 67