Apache Storm - Real Time Analytics

Uploaded on


More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Introduction to Real Time Analytics using Apache Storm www.edureka.in/apache-storm Buy Complete Course at : www.edureka.in/apache-storm Post your Questions on Twitter on @edurekaIN: #askEdureka
  • 2. Objectives of this Session • Un • The need for Real Time Analytics - Usecases • How does Storm come to rescue? • Where does Storm fit in Hadoop Framework? • Storm Architecture – Components of Storm • Quiz to reinforce your learning For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN www.edureka.in/apache-storm
  • 3. Need of Real Time Analytics Ret • Banking - Fraud Transaction Detection • Telecommunication – Silent Roamers Detection • Retail- Inventory Dynamic Pricing • Social Networking- Trending Topics www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 4. Growing Interest in Apache Storm www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 5. Storm Usecases – Need for Real Time Analytics Twitter Trends Responsive Logs Source: https://github.com/nathanmarz/storm/wiki/Powered-By Custom Magazine Feeds Real Time Video Analytics Enable Clinicians to Make Medical Decisions Compare and Display Real Time Prices www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 6. What is Storm ?  Apache Storm is a free and open source distributed real-time computation system.  Storm makes it easy to reliably process unbounded streams of data.  Storm does for real-time processing what Hadoop did for batch processing.  Simple, can be used with any programming language. www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 7. Understanding the Storm Architecture Nimbus Zookeeper Supervisor Zookeeper Zookeeper Supervisor Supervisor Supervisor Supervisor www.edureka.in/apache-storm *Covered in module 2 in the course Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 8. ZooKeeper Nimbus ZooKeeper ZooKeeper Supervisor Supervisor Supervisor Supervisor Supervisor Nimbus node (master node, similar to the Hadoop JobTracker): » Uploads computations for execution » Distributes code across the cluster » Launches workers across the cluster » Monitors computation and reallocates workers as needed ZooKeeper nodes: » Coordinates the Storm cluster Supervisor nodes : » Communicates with Nimbus through Zookeeper, starts and stops workers according to signals from Nimbus Storm Components A Storm cluster has 3 sets of nodes 1. Nimbus node 2. Zookeeper nodes 3. Supervisor nodes www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 9. The work is delegated to different types of components that are each responsible for a simple specific processing task. The input stream of a Storm cluster is handled by a component called a spout. The spout passes the data to a component called a bolt, which transforms it in some way. A bolt either persists the data in some sort of storage, or passes it to some other bolt. Storm Topology www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions spout spout bolt bolt bolt bolt passes data passes data transforms data data storage Input Data Source
  • 10. Why Storm is ideal for Real Time Processing Fast – benchmarked as processing one million, 100 byte messages, per second per node. Scalable – with parallel calculations that run across a cluster of machines. Fault-tolerant – when workers die, Storm will automatically restart them. If a node dies, the worker will be restarted on another node. Reliable – Storm guarantees that each unit of data (tuple) will be processed at least once or exactly once. Messages are only replayed when there are failures. Easy to operate – standard configurations are suitable for production on day one. Once deployed, Storm is easy to operate. http://hortonworks.com/hadoop/storm/ www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 11. MapReduce (Batch) INTERACTIVE (Text) ONLINE (HBase) STORM (Streaming) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) OTHER (Search) (Weave..) http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/YARN.html Storm in the Hadoop Framework www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 12. Upcoming Batch for Storm Start Date: 16th Aug (08:30 PM – 11:30 PM, India Time) / 16th Aug (08:00 AM – 11:00 AM, Pacific Time) 13th Sep (7:00 AM – 10:00 AM, India Time) / 12th Sep (06:30 PM – 09:30 PM, Pacific Time) Curriculum: Module 1: Introduction of Big Data and Storm Module 2: Getting Started with Storm Module 3: Spouts and Bolts Module 4: Trident Topologies Module 5: Real Life Storm Project – 1 Module 6: Real Life Storm Project – 2 www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 13. www.edureka.in/apache-storm Annie’s Question Storm can be used in: - Real-time Processing - Batch Processing - Both
  • 14. www.edureka.in/apache-storm Annie’s Answer Real-time Processing
  • 15. www.edureka.in/apache-storm Annie’s Question Which of them can be a source of Stream? - Spout - Bolt - Both
  • 16. www.edureka.in/apache-storm Annie’s Answer Both
  • 17. www.edureka.in/apache-storm Annie’s Question It is not possible to run Storm process along with MapReduce jobs inside a Hadoop Cluster. - True - False
  • 18. www.edureka.in/apache-storm Annie’s Answer False. With Hadoop 2.0, it is possible.
  • 19. www.edureka.in/apache-storm Annie’s Question A Nimbus Node is similar to TaskTracker Node in Hadoop Cluster. - True - False
  • 20. www.edureka.in/apache-storm Annie’s Answer No. A Nimbus Node is more like a JobTracker Node in Hadoop
  • 21. www.edureka.in/apache-storm Annie’s Question A Storm topology is defined in terms of - Nimbus, Zookeeper, Supervisor nodes - Spout, Bolt - Spout, Bolt, Nimbus, Zookeeper, Supervisor nodes - Spout, Bolt, Zookeeper node
  • 22. www.edureka.in/apache-storm Annie’s Answer Spout and Bolt
  • 23. Questions? www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Buy Complete Course at : www.edureka.in/apache-storm Batch Starts On: 16th Aug 08:30 PM , IST / 16th Aug 08:00 AM, PDT 13th Sep 7:00 AM, IST/ 12th Sep 06:30 PM, PDT Course Fee: USD 339 / INR (17795 + 12.36% Service tax)** For Existing edureka Customers (20% OFF) Price : USD 271/ INR 14326