SlideShare a Scribd company logo
1 of 34
Download to read offline
Apache Storm Concepts
André Dias
andre.dias@summa.com.br
Highlights
● Background
● Storm’s History
● Concepts
● Integrations
● Real Cases
● Q&A
Background
Background
Big Data V’s
Background
Big Data V’s
Volume
Petabytes / Terabytes
Velocity
Real-time / Near Real-time
Variety
Sensors, Blog Posts, Logs, Social Networks...
Background
Data Streaming
Background
Data Streaming
Background
Data Streaming
Now Please… I’
m in traffic…
● Discovery
● Ingest
● Process
● Persist
● Analyze
● Expose
Background
Data Value Chain
● Discovery
● Ingest
● Process
● Persist
● Analyze
● Expose
Background
Data Value Chain (Storm Focus)
Background
Data Value Chain
● Discovery
● Ingest
● Process
● Persist
● Analyze
● Expose
Background
Data Value Chain (Ingest)
Stream Processing
Data in Motion
Batch Processing
● Data processing architecture
○ Generic
○ Scalable
○ Fault-tolerant (human/hardware)
● Low latency
Background
Lambda Architecture (LA)
Background
Lambda Architecture (LA)
Storm’s History
Nathan Marz
● Backtype (2008), acquired by Twitter (2011)
● Lambda Architecture Creator
● September, 2011. Storm Creation
● September, 2013. Storm entered to ASF
Why Use It?
● Scalable, as Hadoop
● No data loss, reliable
● Fault tolerant
● Language agnostic
● Real-time, real fast
Why Use It?
Storm is TLP !!
Concepts
Concepts
● Tuple
● Streams
● Spouts
● Bolts
● Topologies / Trident API
● Stream Groupings
Concepts
Spouts
● Source of Streams
● Data Consumers (Ingestion)
● Emits Tuples
Concepts
Bolts
● Units of Work to tuples
● Data streaming logic
● Can emit tuples as well
● Data store integration
Concepts
Topology
● Data Streaming Flow Representation
● DAG (Direct Acyclic Graph) of Spouts and
Bolts
● Streaming computation
● Each node as individual task (parallel
execution)
● Stateless
Concepts
Trident API
● Abstraction Layer over low-level Storm API
● More Complex Topologies
● Stateful
● Micro-batch
● High-level API (similar to Pig / Cascading -
Hadoop)
● Message processed at least once
(guaranteed)
Concepts
Trident API - Micro Batch
● Trident Batches
○ are Ordered
Concepts
Trident API - Micro Batch
● Trident Batches
○ can be Partitioned
Concepts
Streaming Groups
● Data Flow Control over
Topologies
Architecture
Architecture
Components - Nimbus
● Master node (similar to JobTracker)
● Monitor and distribute the processing
workload across worker nodes
● Stores all its data into Zookeeper
Architecture
Components - Supervisor
● Worker node (similar to TaskTracker)
● Monitor and distribute the processing
workload across worker nodes
● Stores all its data into Zookeeper
Architecture
Overview
● Master-slave approach
● Cluster coordination
(Zookeeper)
● Nimbus HA
Integrations
Real Cases
Collector sensor
information to a
Data Lake
Micro-batch
user contents,
content feeds
and application
logs
Real-time user
music
recommendations
Q&A
THANK YOU!
Use Storm

More Related Content

What's hot

Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Stormviirya
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridDataWorks Summit
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Sonal Raj
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsData Con LA
 
Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataDataWorks Summit
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleDung Ngua
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationUday Vakalapudi
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and howPetr Zapletal
 
Realtime processing with storm presentation
Realtime processing with storm presentationRealtime processing with storm presentation
Realtime processing with storm presentationGabriel Eisbruch
 
Experience with Kafka & Storm
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & StormOtto Mok
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 

What's hot (20)

Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-Data
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
Storm
StormStorm
Storm
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & Example
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Realtime processing with storm presentation
Realtime processing with storm presentationRealtime processing with storm presentation
Realtime processing with storm presentation
 
Experience with Kafka & Storm
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & Storm
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 

Similar to Apache Storm Concepts

AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Adrianos Dadis
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned Omid Vahdaty
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streamingdatamantra
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Adrianos Dadis
 
4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using A...
4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using A...4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using A...
4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using A...Athens Big Data
 
Introduction to near real time computing
Introduction to near real time computingIntroduction to near real time computing
Introduction to near real time computingTao Li
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streamingdatamantra
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user groupAdam Doyle
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerFederico Palladoro
 
Initial presentation of openstack (for montreal user group)
Initial presentation of openstack (for montreal user group)Initial presentation of openstack (for montreal user group)
Initial presentation of openstack (for montreal user group)Marcos García
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding HadoopAhmed Ossama
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architecturesDaniel Marcous
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Hernan Costante
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 

Similar to Apache Storm Concepts (20)

AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016
 
4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using A...
4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using A...4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using A...
4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using A...
 
Introduction to near real time computing
Introduction to near real time computingIntroduction to near real time computing
Introduction to near real time computing
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user group
 
Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 
Initial presentation of openstack (for montreal user group)
Initial presentation of openstack (for montreal user group)Initial presentation of openstack (for montreal user group)
Initial presentation of openstack (for montreal user group)
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 

Apache Storm Concepts