Apache Storm
Real-Time Event Processing
Big Data Tools
• Data Processing
• Perform calculations on datasets e.g. Storm
• Data Transfer
• Gather & ingest data into data processing systems e.g. Kafka
• Data Storage
• Store the datasets during various data processing stages e.g. Hadoop
Apache Storm
Distributed, real-time computational framework, used
to process unbounded streams.
• It enables the integration with messaging and persistence
frameworks.
• It consumes the streams of data from different data sources.
• It process and transform the streams in different ways.
Apache Storm Concepts
Topology
Storm topology represents a graph of computations using:
• Nodes
• Represents individual computations
• Edges
• Represents data being passes between Nodes
Topology is driven through the continuous live feed of data and
perform some operation.
Topology
Node Edge Node Edge Node
Apache Storm Concepts
• Tuple
• Data send between nodes in form of Tuples.
• Stream
• Unbounded sequence of Tuples between two Nodes.
• Spout
• Source of Stream in Topology.
• Bolt
• Computational Node, accept input stream and perform
computations.
Topology
Spout Stream Bolt Stream BoltMessaging
System
Live feed of
data
Apache Storm Concepts
• Spout
• Receive data by
• Listen to message queue for incoming messages
• Listen to database changes
• Listen to other source of data feed
• Act as a source of stream
• Read data from data source
• Emit tuple to next type of node called Bolt.
Apache Storm Concepts
• Bolt
• Accept tuple from its input stream
• Perform computation/transformation
• Perform filtering, aggregation or perhaps join
• Emit new tuple to its output stream
Apache Kafka
Kafka is a distributed publish-subscribe messaging
system that is designed to be fast, scalable, and
durable.
• Kafka maintains feeds of messages in topics
• Producers write data to topics and consumers read from
topics
• Topics are partitioned and replicated across multiple nodes.
Kafka Configuration
KAFKA_HOMEconfigserver.properties
# A comma seperated list of directories under which to store log
fileslog.dirs=C:/Installers/kafka/kafka-logs
KAFKA_HOMEconfigzookeeper.properties
# the directory where the snapshot is
stored.dataDir=C:/Installers/kafka/zookeeper-data
Kafka Commands
Start Zookeeper
$ bin/zookeeper-server-start.sh config/zookeeper.properties
$ binwindowszookeeper-server-start.bat configzookeeper.properties
Start Kafka Broker
$ bin/kafka-server-start.sh config/server.properties
$ binwindowskafka-server-start.bat configserver.properties
Create a Topic
$ bin/kafka-topics.sh --list --zookeeper localhost:2181
$ binwindowskafka-topics --list --zookeeper localhost:2181

Apache storm

  • 1.
  • 2.
    Big Data Tools •Data Processing • Perform calculations on datasets e.g. Storm • Data Transfer • Gather & ingest data into data processing systems e.g. Kafka • Data Storage • Store the datasets during various data processing stages e.g. Hadoop
  • 3.
    Apache Storm Distributed, real-timecomputational framework, used to process unbounded streams. • It enables the integration with messaging and persistence frameworks. • It consumes the streams of data from different data sources. • It process and transform the streams in different ways.
  • 4.
    Apache Storm Concepts Topology Stormtopology represents a graph of computations using: • Nodes • Represents individual computations • Edges • Represents data being passes between Nodes Topology is driven through the continuous live feed of data and perform some operation.
  • 5.
  • 6.
    Apache Storm Concepts •Tuple • Data send between nodes in form of Tuples. • Stream • Unbounded sequence of Tuples between two Nodes. • Spout • Source of Stream in Topology. • Bolt • Computational Node, accept input stream and perform computations.
  • 7.
    Topology Spout Stream BoltStream BoltMessaging System Live feed of data
  • 8.
    Apache Storm Concepts •Spout • Receive data by • Listen to message queue for incoming messages • Listen to database changes • Listen to other source of data feed • Act as a source of stream • Read data from data source • Emit tuple to next type of node called Bolt.
  • 9.
    Apache Storm Concepts •Bolt • Accept tuple from its input stream • Perform computation/transformation • Perform filtering, aggregation or perhaps join • Emit new tuple to its output stream
  • 10.
    Apache Kafka Kafka isa distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable. • Kafka maintains feeds of messages in topics • Producers write data to topics and consumers read from topics • Topics are partitioned and replicated across multiple nodes.
  • 11.
    Kafka Configuration KAFKA_HOMEconfigserver.properties # Acomma seperated list of directories under which to store log fileslog.dirs=C:/Installers/kafka/kafka-logs KAFKA_HOMEconfigzookeeper.properties # the directory where the snapshot is stored.dataDir=C:/Installers/kafka/zookeeper-data
  • 12.
    Kafka Commands Start Zookeeper $bin/zookeeper-server-start.sh config/zookeeper.properties $ binwindowszookeeper-server-start.bat configzookeeper.properties Start Kafka Broker $ bin/kafka-server-start.sh config/server.properties $ binwindowskafka-server-start.bat configserver.properties Create a Topic $ bin/kafka-topics.sh --list --zookeeper localhost:2181 $ binwindowskafka-topics --list --zookeeper localhost:2181