SlideShare a Scribd company logo
bvishnu
Apache	
  Storm
• A	
  Stream	
  Processing	
  framework
Apache	
  Storm
• A	
  Stream	
  Processing	
  framework
• Used	
  to	
  pull	
  data	
  from	
  a	
  stream	
  and	
  
perform	
  real	
  time	
  analytics	
  on	
  the	
  
data
a	
  Stream…
• Can	
  be	
  Apache	
  Kafka	
  ,	
  Amazon	
  
Kinesis.
a	
  Stream…
• Can	
  be	
  Apache	
  Kafka	
  ,	
  Amazon	
  
Kinesis.
• Normally	
  has	
  partitions	
  /	
  shards	
  for	
  
better	
  read	
  &	
  write	
  throughput
Partition	
  Metadata
Partition	
  Metadata
• Storm	
  uses	
  INTEGERS (0,1…)	
  to	
  identify	
  
partitions.
Partition	
  Metadata
• Storm	
  uses	
  INTEGERS (0,1…)	
  to	
  identify	
  
partitions.
• Where	
  as	
  ……
Partition	
  Metadata
• Storm	
  uses	
  INTEGERS (0,1…)	
  to	
  identify	
  
partitions.
• Where	
  as	
  ……
• Amazon	
  Kinesis	
  uses	
  STRINGS to	
  identify	
  
partitions
So	
  how	
  can	
  we	
  process	
  data	
  ?
So	
  how	
  can	
  we	
  process	
  data	
  ?
• User	
  sorts	
  the	
  STRINGS	
  (shard	
  Id’s)
So	
  how	
  can	
  we	
  process	
  data	
  ?
• User	
  sorts	
  the	
  STRINGS	
  (shard	
  Id’s)
• User	
  maps	
  the	
  sorted	
  items	
  id’s	
  from	
  0...N
So	
  how	
  can	
  we	
  process	
  data	
  ?
• User	
  sorts	
  the	
  STRINGS(shard	
  Id’s)
• User	
  maps	
  the	
  sorted	
  items	
  id’s	
  from	
  0...N
Shard-­‐id-­‐0001	
  	
  	
  	
  <-­‐>	
  	
  0
Shard-­‐id-­‐0002	
  	
  	
  	
  <-­‐>	
  	
  1
…..
…..
Storm	
  API
Shard	
  Split	
  in	
  
Amazon	
  Kinesis
Shard	
  Split	
  in	
  
Amazon	
  Kinesis
Shard	
  Split	
  in	
  
Amazon	
  Kinesis
Stream	
  shrinks	
  
(3	
  to	
  2	
  shards)
Disturbance	
  in	
  the	
  Force
• Storm	
  partition	
  metadata	
  NO longer	
  valid	
  as	
  
the	
  shard	
  has	
  been	
  deleted.
Disturbance	
  in	
  the	
  Force
• Storm	
  partition	
  metadata	
  NO longer	
  valid	
  as	
  
the	
  shard	
  has	
  been	
  deleted.
• Storm	
  partition	
  metadata	
  should	
  now	
  be:
shard-­‐2	
  	
  	
  	
  <-­‐>	
  	
  0
shard-­‐3	
  	
  	
  	
  <-­‐>	
  	
  1
a Solution:
a	
  Solution:
• WHITE_LIST	
  of	
  shards	
  for	
  a	
  storm	
  topology.
a	
  Solution:
• WHITE_LIST	
  of	
  shards	
  for	
  a	
  storm	
  topology.
• A	
  storm	
  topology	
  pulls	
  from	
  a	
  specific	
  set	
  of	
  
shards.
a	
  Solution:
• WHITE_LIST	
  of	
  shards	
  for	
  a	
  storm	
  topology.
• A	
  storm	
  topology	
  pulls	
  from	
  a	
  specific	
  set	
  of	
  
shards.
• So	
  in	
  our	
  case:
– start	
  topology-­‐1 with	
  WHITELIST	
  =“shard-­‐1”
a	
  Solution:
• WHITE_LIST	
  of	
  shards	
  for	
  a	
  storm	
  topology.
• A	
  storm	
  topology	
  pulls	
  from	
  a	
  specific	
  set	
  of	
  
shards.
• So	
  in	
  our	
  case:
– start	
  topology-­‐1 with	
  WHITELIST	
  =“shard-­‐1”
– split	
  shard
a	
  Solution:
• WHITE_LIST	
  of	
  shards	
  for	
  a	
  storm	
  topology.
• A	
  storm	
  topology	
  pulls	
  from	
  a	
  specific	
  set	
  of	
  
shards.
• So	
  in	
  our	
  case:
– start	
  topology-­‐1 with	
  WHITELIST	
  =“shard-­‐1”
– split	
  shard
– start	
  topology-­‐2 with	
  WHITELIST=“shard-­‐2	
  &	
  3”
a	
  Solution…
• When	
  shard-­‐1	
  	
  gets	
  deleted	
  ,	
  topology	
  1	
  dies	
  with	
  it.
a	
  Solution…
• When	
  shard-­‐1	
  	
  gets	
  deleted	
  ,	
  topology	
  1	
  dies	
  with	
  it.
• Topology	
  2	
  continues	
  processing	
  data	
  for	
  the	
  new	
  
shards.
a	
  Solution…
So,	
  there	
  is	
  NO	
  metadata	
  conflict	
  ,
as	
  there	
  are	
  2	
  different	
  topologies	
  
pulling	
  data	
  from	
  different	
  sets	
  of	
  shards.
Thank	
  you
&
May	
  the	
  force	
  be	
  with	
  you	
  !
jaihind213@gmail.com
sweetweet213@twitter
mash213.wordpress.com
linkedin.com/in/213vishnu

More Related Content

What's hot

Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basics
Duyhai Doan
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
Yiguang Hu
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
whoschek
 
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...
CloudxLab
 
Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and Spark
Ben Slater
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
alexstorer
 
MongoDB: Advance concepts - Replication and Sharding
MongoDB: Advance concepts - Replication and ShardingMongoDB: Advance concepts - Replication and Sharding
MongoDB: Advance concepts - Replication and Sharding
Knoldus Inc.
 

What's hot (7)

Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basics
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
 
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...
 
Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and Spark
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
 
MongoDB: Advance concepts - Replication and Sharding
MongoDB: Advance concepts - Replication and ShardingMongoDB: Advance concepts - Replication and Sharding
MongoDB: Advance concepts - Replication and Sharding
 

Viewers also liked

Real-Time Analytics with Apache Storm
Real-Time Analytics with Apache StormReal-Time Analytics with Apache Storm
Real-Time Analytics with Apache Storm
Taewoo Kim
 
Twitter Stream Processing
Twitter Stream ProcessingTwitter Stream Processing
Twitter Stream Processing
Colin Surprenant
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
StampedeCon
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
Eugene Dvorkin
 
Apache Storm
Apache StormApache Storm
Apache Storm
Edureka!
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Adrianos Dadis
 
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
Andrea Iacono
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
viirya
 

Viewers also liked (8)

Real-Time Analytics with Apache Storm
Real-Time Analytics with Apache StormReal-Time Analytics with Apache Storm
Real-Time Analytics with Apache Storm
 
Twitter Stream Processing
Twitter Stream ProcessingTwitter Stream Processing
Twitter Stream Processing
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
 
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 

Similar to StormWars - when the data stream shrinks

Far cry 3
Far cry 3Far cry 3
Far cry 3
sojuwugor
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Databricks
 
Storing Cassandra Metrics
Storing Cassandra MetricsStoring Cassandra Metrics
Storing Cassandra Metrics
Chris Lohfink
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
DataStax
 
Sparkling Random Ferns by P Dendek and M Fedoryszak
Sparkling Random Ferns by  P Dendek and M FedoryszakSparkling Random Ferns by  P Dendek and M Fedoryszak
Sparkling Random Ferns by P Dendek and M Fedoryszak
Spark Summit
 
Storm Anatomy
Storm AnatomyStorm Anatomy
Storm Anatomy
Eiichiro Uchiumi
 
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopUnraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Piotr Turek
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Lucidworks
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and Spark
Josef Adersberger
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
QAware GmbH
 
Managing terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigManaging terabytes: When Postgres gets big
Managing terabytes: When Postgres gets big
Selena Deckelmann
 
Work-stealing Tree Data Structure
Work-stealing Tree Data StructureWork-stealing Tree Data Structure
Work-stealing Tree Data Structure
Aleksandar Prokopec
 
Managing terabytes: When PostgreSQL gets big
Managing terabytes: When PostgreSQL gets bigManaging terabytes: When PostgreSQL gets big
Managing terabytes: When PostgreSQL gets big
Selena Deckelmann
 
Cassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUGCassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUG
Matthew Dennis
 
Vaex pygrunn
Vaex pygrunnVaex pygrunn
Vaex pygrunn
Maarten Breddels
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
QAware GmbH
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
Josef Adersberger
 
DZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling WebinarDZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling Webinar
Matthew Dennis
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
P. Taylor Goetz
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
MongoDB
 

Similar to StormWars - when the data stream shrinks (20)

Far cry 3
Far cry 3Far cry 3
Far cry 3
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
 
Storing Cassandra Metrics
Storing Cassandra MetricsStoring Cassandra Metrics
Storing Cassandra Metrics
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
 
Sparkling Random Ferns by P Dendek and M Fedoryszak
Sparkling Random Ferns by  P Dendek and M FedoryszakSparkling Random Ferns by  P Dendek and M Fedoryszak
Sparkling Random Ferns by P Dendek and M Fedoryszak
 
Storm Anatomy
Storm AnatomyStorm Anatomy
Storm Anatomy
 
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopUnraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and Spark
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
Managing terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigManaging terabytes: When Postgres gets big
Managing terabytes: When Postgres gets big
 
Work-stealing Tree Data Structure
Work-stealing Tree Data StructureWork-stealing Tree Data Structure
Work-stealing Tree Data Structure
 
Managing terabytes: When PostgreSQL gets big
Managing terabytes: When PostgreSQL gets bigManaging terabytes: When PostgreSQL gets big
Managing terabytes: When PostgreSQL gets big
 
Cassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUGCassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUG
 
Vaex pygrunn
Vaex pygrunnVaex pygrunn
Vaex pygrunn
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 
DZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling WebinarDZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling Webinar
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 

More from vishnu rao

A talk on mysql & aurora
A talk on mysql & auroraA talk on mysql & aurora
A talk on mysql & aurora
vishnu rao
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
vishnu rao
 
Mysql Relay log - the unsung hero
Mysql Relay log - the unsung heroMysql Relay log - the unsung hero
Mysql Relay log - the unsung hero
vishnu rao
 
simple introduction to hadoop
simple introduction to hadoopsimple introduction to hadoop
simple introduction to hadoop
vishnu rao
 
Druid beginner performance tips
Druid beginner performance tipsDruid beginner performance tips
Druid beginner performance tips
vishnu rao
 
Demystifying datastores
Demystifying datastoresDemystifying datastores
Demystifying datastores
vishnu rao
 
Visualising Basic Concepts of Docker
Visualising Basic Concepts of Docker Visualising Basic Concepts of Docker
Visualising Basic Concepts of Docker
vishnu rao
 
Punch clock for debugging apache storm
Punch clock for  debugging apache stormPunch clock for  debugging apache storm
Punch clock for debugging apache storm
vishnu rao
 
a wild Supposition: can MySQL be Kafka ?
a wild Supposition: can MySQL be Kafka ?a wild Supposition: can MySQL be Kafka ?
a wild Supposition: can MySQL be Kafka ?
vishnu rao
 
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
vishnu rao
 

More from vishnu rao (10)

A talk on mysql & aurora
A talk on mysql & auroraA talk on mysql & aurora
A talk on mysql & aurora
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Mysql Relay log - the unsung hero
Mysql Relay log - the unsung heroMysql Relay log - the unsung hero
Mysql Relay log - the unsung hero
 
simple introduction to hadoop
simple introduction to hadoopsimple introduction to hadoop
simple introduction to hadoop
 
Druid beginner performance tips
Druid beginner performance tipsDruid beginner performance tips
Druid beginner performance tips
 
Demystifying datastores
Demystifying datastoresDemystifying datastores
Demystifying datastores
 
Visualising Basic Concepts of Docker
Visualising Basic Concepts of Docker Visualising Basic Concepts of Docker
Visualising Basic Concepts of Docker
 
Punch clock for debugging apache storm
Punch clock for  debugging apache stormPunch clock for  debugging apache storm
Punch clock for debugging apache storm
 
a wild Supposition: can MySQL be Kafka ?
a wild Supposition: can MySQL be Kafka ?a wild Supposition: can MySQL be Kafka ?
a wild Supposition: can MySQL be Kafka ?
 
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
 

Recently uploaded

Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 

Recently uploaded (20)

Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 

StormWars - when the data stream shrinks

  • 2.
  • 3. Apache  Storm • A  Stream  Processing  framework
  • 4. Apache  Storm • A  Stream  Processing  framework • Used  to  pull  data  from  a  stream  and   perform  real  time  analytics  on  the   data
  • 5. a  Stream… • Can  be  Apache  Kafka  ,  Amazon   Kinesis.
  • 6. a  Stream… • Can  be  Apache  Kafka  ,  Amazon   Kinesis. • Normally  has  partitions  /  shards  for   better  read  &  write  throughput
  • 8. Partition  Metadata • Storm  uses  INTEGERS (0,1…)  to  identify   partitions.
  • 9. Partition  Metadata • Storm  uses  INTEGERS (0,1…)  to  identify   partitions. • Where  as  ……
  • 10. Partition  Metadata • Storm  uses  INTEGERS (0,1…)  to  identify   partitions. • Where  as  …… • Amazon  Kinesis  uses  STRINGS to  identify   partitions
  • 11. So  how  can  we  process  data  ?
  • 12. So  how  can  we  process  data  ? • User  sorts  the  STRINGS  (shard  Id’s)
  • 13. So  how  can  we  process  data  ? • User  sorts  the  STRINGS  (shard  Id’s) • User  maps  the  sorted  items  id’s  from  0...N
  • 14. So  how  can  we  process  data  ? • User  sorts  the  STRINGS(shard  Id’s) • User  maps  the  sorted  items  id’s  from  0...N Shard-­‐id-­‐0001        <-­‐>    0 Shard-­‐id-­‐0002        <-­‐>    1 ….. …..
  • 16. Shard  Split  in   Amazon  Kinesis
  • 17. Shard  Split  in   Amazon  Kinesis
  • 18. Shard  Split  in   Amazon  Kinesis
  • 19. Stream  shrinks   (3  to  2  shards)
  • 20.
  • 21. Disturbance  in  the  Force • Storm  partition  metadata  NO longer  valid  as   the  shard  has  been  deleted.
  • 22. Disturbance  in  the  Force • Storm  partition  metadata  NO longer  valid  as   the  shard  has  been  deleted. • Storm  partition  metadata  should  now  be: shard-­‐2        <-­‐>    0 shard-­‐3        <-­‐>    1
  • 24. a  Solution: • WHITE_LIST  of  shards  for  a  storm  topology.
  • 25. a  Solution: • WHITE_LIST  of  shards  for  a  storm  topology. • A  storm  topology  pulls  from  a  specific  set  of   shards.
  • 26. a  Solution: • WHITE_LIST  of  shards  for  a  storm  topology. • A  storm  topology  pulls  from  a  specific  set  of   shards. • So  in  our  case: – start  topology-­‐1 with  WHITELIST  =“shard-­‐1”
  • 27. a  Solution: • WHITE_LIST  of  shards  for  a  storm  topology. • A  storm  topology  pulls  from  a  specific  set  of   shards. • So  in  our  case: – start  topology-­‐1 with  WHITELIST  =“shard-­‐1” – split  shard
  • 28. a  Solution: • WHITE_LIST  of  shards  for  a  storm  topology. • A  storm  topology  pulls  from  a  specific  set  of   shards. • So  in  our  case: – start  topology-­‐1 with  WHITELIST  =“shard-­‐1” – split  shard – start  topology-­‐2 with  WHITELIST=“shard-­‐2  &  3”
  • 29. a  Solution… • When  shard-­‐1    gets  deleted  ,  topology  1  dies  with  it.
  • 30. a  Solution… • When  shard-­‐1    gets  deleted  ,  topology  1  dies  with  it. • Topology  2  continues  processing  data  for  the  new   shards.
  • 31. a  Solution… So,  there  is  NO  metadata  conflict  , as  there  are  2  different  topologies   pulling  data  from  different  sets  of  shards.
  • 32.
  • 33. Thank  you & May  the  force  be  with  you  ! jaihind213@gmail.com sweetweet213@twitter mash213.wordpress.com linkedin.com/in/213vishnu