Your SlideShare is downloading. ×
Apache Storm - Real Time Analytics
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Apache Storm - Real Time Analytics


Published on

Published in: Education

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Introduction to Real Time Analytics using Apache Storm Buy Complete Course at : Post your Questions on Twitter on @edurekaIN: #askEdureka
  • 2. Objectives of this Session • Un • The need for Real Time Analytics - Usecases • How does Storm come to rescue? • Where does Storm fit in Hadoop Framework? • Storm Architecture – Components of Storm • Quiz to reinforce your learning For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN
  • 3. Need of Real Time Analytics Ret • Banking - Fraud Transaction Detection • Telecommunication – Silent Roamers Detection • Retail- Inventory Dynamic Pricing • Social Networking- Trending Topics @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 4. Growing Interest in Apache Storm @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 5. Storm Usecases – Need for Real Time Analytics Twitter Trends Responsive Logs Source: Custom Magazine Feeds Real Time Video Analytics Enable Clinicians to Make Medical Decisions Compare and Display Real Time Prices @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 6. What is Storm ?  Apache Storm is a free and open source distributed real-time computation system.  Storm makes it easy to reliably process unbounded streams of data.  Storm does for real-time processing what Hadoop did for batch processing.  Simple, can be used with any programming language. @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 7. Understanding the Storm Architecture Nimbus Zookeeper Supervisor Zookeeper Zookeeper Supervisor Supervisor Supervisor Supervisor *Covered in module 2 in the course Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 8. ZooKeeper Nimbus ZooKeeper ZooKeeper Supervisor Supervisor Supervisor Supervisor Supervisor Nimbus node (master node, similar to the Hadoop JobTracker): » Uploads computations for execution » Distributes code across the cluster » Launches workers across the cluster » Monitors computation and reallocates workers as needed ZooKeeper nodes: » Coordinates the Storm cluster Supervisor nodes : » Communicates with Nimbus through Zookeeper, starts and stops workers according to signals from Nimbus Storm Components A Storm cluster has 3 sets of nodes 1. Nimbus node 2. Zookeeper nodes 3. Supervisor nodes @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 9. The work is delegated to different types of components that are each responsible for a simple specific processing task. The input stream of a Storm cluster is handled by a component called a spout. The spout passes the data to a component called a bolt, which transforms it in some way. A bolt either persists the data in some sort of storage, or passes it to some other bolt. Storm Topology @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions spout spout bolt bolt bolt bolt passes data passes data transforms data data storage Input Data Source
  • 10. Why Storm is ideal for Real Time Processing Fast – benchmarked as processing one million, 100 byte messages, per second per node. Scalable – with parallel calculations that run across a cluster of machines. Fault-tolerant – when workers die, Storm will automatically restart them. If a node dies, the worker will be restarted on another node. Reliable – Storm guarantees that each unit of data (tuple) will be processed at least once or exactly once. Messages are only replayed when there are failures. Easy to operate – standard configurations are suitable for production on day one. Once deployed, Storm is easy to operate. @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 11. MapReduce (Batch) INTERACTIVE (Text) ONLINE (HBase) STORM (Streaming) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) OTHER (Search) (Weave..) Storm in the Hadoop Framework @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 12. Upcoming Batch for Storm Start Date: 16th Aug (08:30 PM – 11:30 PM, India Time) / 16th Aug (08:00 AM – 11:00 AM, Pacific Time) 13th Sep (7:00 AM – 10:00 AM, India Time) / 12th Sep (06:30 PM – 09:30 PM, Pacific Time) Curriculum: Module 1: Introduction of Big Data and Storm Module 2: Getting Started with Storm Module 3: Spouts and Bolts Module 4: Trident Topologies Module 5: Real Life Storm Project – 1 Module 6: Real Life Storm Project – 2 @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 13. Annie’s Question Storm can be used in: - Real-time Processing - Batch Processing - Both
  • 14. Annie’s Answer Real-time Processing
  • 15. Annie’s Question Which of them can be a source of Stream? - Spout - Bolt - Both
  • 16. Annie’s Answer Both
  • 17. Annie’s Question It is not possible to run Storm process along with MapReduce jobs inside a Hadoop Cluster. - True - False
  • 18. Annie’s Answer False. With Hadoop 2.0, it is possible.
  • 19. Annie’s Question A Nimbus Node is similar to TaskTracker Node in Hadoop Cluster. - True - False
  • 20. Annie’s Answer No. A Nimbus Node is more like a JobTracker Node in Hadoop
  • 21. Annie’s Question A Storm topology is defined in terms of - Nimbus, Zookeeper, Supervisor nodes - Spout, Bolt - Spout, Bolt, Nimbus, Zookeeper, Supervisor nodes - Spout, Bolt, Zookeeper node
  • 22. Annie’s Answer Spout and Bolt
  • 23. Questions? @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Buy Complete Course at : Batch Starts On: 16th Aug 08:30 PM , IST / 16th Aug 08:00 AM, PDT 13th Sep 7:00 AM, IST/ 12th Sep 06:30 PM, PDT Course Fee: USD 339 / INR (17795 + 12.36% Service tax)** For Existing edureka Customers (20% OFF) Price : USD 271/ INR 14326