Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Slide #2: How to Setup Apache STROM

This slide shows how to install Apache in WIndows PCs.

  • Login to see the comments

  • Be the first to like this

Slide #2: How to Setup Apache STROM

  1. 1. Firing up Strom Md. Shamsur Rahim Student of MScCS American International University Bangladesh
  2. 2. Required Software Package • Java SDK 6 ▫ nloads/index.html • Maven ▫ software dependency management tool and is used to manage the project's build, reporting, and documentation. ▫ • Spring Tool Suite: IDE ▫ is an integrated development environment and is used to develop applications. ▫
  3. 3. Configuring Spring Tool Suite: IDE • Go to Windows| Preferences| Maven| Installations and add the path of maven-3.0.4, as shown in the following screenshot: • Go to Window| Preferences| Java| Installed JREs and add the path of Java Runtime Environment 6(JRE 6), as shown in the following screenshot:
  4. 4. Developing a sample topology
  5. 5. Steps to create and execute a sample topology 1. Start your STS IDE and create a Maven project. 2. Select your Workspace location and click “Next”. 3. Now select “org.apache.maven.archtypes” [A Maven Java plugin dev. project] and click Next. 4. Specify com.learningstorm as Group Id and storm-example as Artifact Id, as shown in the following screenshot:
  6. 6. Storm components A Storm cluster follows a master- slave model where the master and slave processes are coordinated through ZooKeeper.
  7. 7. Storm components: Nimbus • Nimbus: ▫ is the master in a Storm cluster. ▫ It is responsible for :  distributing application code across various worker nodes  assigning tasks to different machines  monitoring tasks for any failures  restarting them as and when required ▫ It is stateless. ▫ Stores all of its data in ZooKeeper. ▫ Only one Nimbus node in a Storm Cluster. ▫ It can be restarted without having any effects on the already running tasks on the worker nodes.
  8. 8. Storm components: Supervisor nodes • Supervisor nodes ▫ are the worker nodes in a Storm cluster. ▫ Each supervisor node runs a supervisor daemon that is responsible for-  creating,  Starting and  Stopping worker processes to execute the tasks assigned to that node. ▫ Like Nimbus, a supervisor daemon is also fail-fast and stores all of its state in ZooKeeper so that it can be restarted without any state loss. ▫ A single supervisor daemon normally handles multiple worker processes .
  9. 9. Storm components: The ZooKeeper cluster • ZooKeeper is an application that: ▫ Coordinates, ▫ Shares some configuration information Among various processes. ▫ Being a distributed application, Storm also uses a ZooKeeper cluster to coordinate various processes. ▫ All of the states associated with the cluster and the various tasks submitted to the Storm are stored in ZooKeeper. ▫ Nimbus and Supervisor Nodes communicate only through ZooKeeper. ▫ As all data are stored in ZooKeeper, So any time we can kill Nimbus or supervisor daemons without affecting cluster.
  10. 10. The Storm data model • Basic unit of data that can be processed by a Storm application is called a tuple. • Each tuple consists of a predefined list of fields. • The value of each field can be a byte, char, integer, long, float, double, Boolean, or byte array. • Storm also provides an API to define your own data types, which can be serialized as fields in a tuple. • A tuple is dynamically typed, that is, you just need to define the names of the fields in a tuple and not their data type. • Fields in a tuple can be accessed by its name getValueByField(String) or its positional index getValue(int).
  11. 11. Definition of a Storm topology • A topology is an abstraction that defines the graph of the computation. • We create a Storm topology and deploy it on a Storm cluster to process the data. • A topology can be represented by a direct acyclic graph.
  12. 12. The Components of a Storm topology: Stream • Stream: ▫ The key abstraction in Storm is that of a stream. ▫ It is an unbounded sequence of tuples that can be processed in parallel by Storm. ▫ Each stream can be processed by a single or multiple types of bolts. ▫ Each stream in a Storm application is given an ID. ▫ The bolts can produce and consume tuples from these streams on the basis of their ID. ▫ Each stream also has an associated schema for the tuples that will flow through it.(????)
  13. 13. The Components of a Storm topology: Spout • Spout: ▫ A spout is the source of tuples in a Storm topology. ▫ It is responsible for reading or listening to data from an external source and publishing them— emitting (into stream). ▫ A spout can emit multiple streams. ▫ Some important methods of spout are: nextTuple() ack(Object msgId) open() fail(Object msgId)
  14. 14. The Components of a Storm topology: Bolt• Bolt: ▫ A bolt is the processing powerhouse of a Storm topology. ▫ It is responsible for transforming a stream. ▫ each bolt in the topology should be doing a simple transformation of the tuples. ▫ many such bolts can coordinate with each other to exhibit a complex transformation. ▫ Some important methods of Bolt are:  execute(Tuple input):  This method is executed for each tuple that comes through the subscribed input streams.  prepare(Map stormConf, TopologyContext context, OutputCollector collector):  In this method, you should make sure the bolt is properly configured to execute tuples now.
  15. 15. Operation modes Operation modes indicate how the topology is deployed in Storm. • The local mode: ▫ Storm topologies run on the local machine in a single JVM. • The remote mode: ▫ we will use the Storm client to submit the topology to the master along with all the necessary code required to execute the topology.