• Save
Apache Storm - Real Time Analytics
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Apache Storm - Real Time Analytics

on

  • 6,827 views

 

Statistics

Views

Total Views
6,827
Views on SlideShare
1,890
Embed Views
4,937

Actions

Likes
3
Downloads
2
Comments
0

8 Embeds 4,937

http://www.edureka.co 3556
http://localhost 702
http://www.edureka.in 566
http://feeds.feedburner.com 40
http://www.slideee.com 32
http://192.168.1.56 30
https://twitter.com 10
http://dschool.co 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Apache Storm - Real Time Analytics Presentation Transcript

  • 1. Introduction to Real Time Analytics using Apache Storm www.edureka.in/apache-storm Buy Complete Course at : www.edureka.in/apache-storm Post your Questions on Twitter on @edurekaIN: #askEdureka
  • 2. Objectives of this Session • Un • The need for Real Time Analytics - Usecases • How does Storm come to rescue? • Where does Storm fit in Hadoop Framework? • Storm Architecture – Components of Storm • Quiz to reinforce your learning For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN www.edureka.in/apache-storm
  • 3. Need of Real Time Analytics Ret • Banking - Fraud Transaction Detection • Telecommunication – Silent Roamers Detection • Retail- Inventory Dynamic Pricing • Social Networking- Trending Topics www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 4. Growing Interest in Apache Storm www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 5. Storm Usecases – Need for Real Time Analytics Twitter Trends Responsive Logs Source: https://github.com/nathanmarz/storm/wiki/Powered-By Custom Magazine Feeds Real Time Video Analytics Enable Clinicians to Make Medical Decisions Compare and Display Real Time Prices www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 6. What is Storm ?  Apache Storm is a free and open source distributed real-time computation system.  Storm makes it easy to reliably process unbounded streams of data.  Storm does for real-time processing what Hadoop did for batch processing.  Simple, can be used with any programming language. www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 7. Understanding the Storm Architecture Nimbus Zookeeper Supervisor Zookeeper Zookeeper Supervisor Supervisor Supervisor Supervisor www.edureka.in/apache-storm *Covered in module 2 in the course Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 8. ZooKeeper Nimbus ZooKeeper ZooKeeper Supervisor Supervisor Supervisor Supervisor Supervisor Nimbus node (master node, similar to the Hadoop JobTracker): » Uploads computations for execution » Distributes code across the cluster » Launches workers across the cluster » Monitors computation and reallocates workers as needed ZooKeeper nodes: » Coordinates the Storm cluster Supervisor nodes : » Communicates with Nimbus through Zookeeper, starts and stops workers according to signals from Nimbus Storm Components A Storm cluster has 3 sets of nodes 1. Nimbus node 2. Zookeeper nodes 3. Supervisor nodes www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 9. The work is delegated to different types of components that are each responsible for a simple specific processing task. The input stream of a Storm cluster is handled by a component called a spout. The spout passes the data to a component called a bolt, which transforms it in some way. A bolt either persists the data in some sort of storage, or passes it to some other bolt. Storm Topology www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions spout spout bolt bolt bolt bolt passes data passes data transforms data data storage Input Data Source
  • 10. Why Storm is ideal for Real Time Processing Fast – benchmarked as processing one million, 100 byte messages, per second per node. Scalable – with parallel calculations that run across a cluster of machines. Fault-tolerant – when workers die, Storm will automatically restart them. If a node dies, the worker will be restarted on another node. Reliable – Storm guarantees that each unit of data (tuple) will be processed at least once or exactly once. Messages are only replayed when there are failures. Easy to operate – standard configurations are suitable for production on day one. Once deployed, Storm is easy to operate. http://hortonworks.com/hadoop/storm/ www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 11. MapReduce (Batch) INTERACTIVE (Text) ONLINE (HBase) STORM (Streaming) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) OTHER (Search) (Weave..) http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/YARN.html Storm in the Hadoop Framework www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 12. Upcoming Batch for Storm Start Date: 16th Aug (08:30 PM – 11:30 PM, India Time) / 16th Aug (08:00 AM – 11:00 AM, Pacific Time) 13th Sep (7:00 AM – 10:00 AM, India Time) / 12th Sep (06:30 PM – 09:30 PM, Pacific Time) Curriculum: Module 1: Introduction of Big Data and Storm Module 2: Getting Started with Storm Module 3: Spouts and Bolts Module 4: Trident Topologies Module 5: Real Life Storm Project – 1 Module 6: Real Life Storm Project – 2 www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 13. www.edureka.in/apache-storm Annie’s Question Storm can be used in: - Real-time Processing - Batch Processing - Both
  • 14. www.edureka.in/apache-storm Annie’s Answer Real-time Processing
  • 15. www.edureka.in/apache-storm Annie’s Question Which of them can be a source of Stream? - Spout - Bolt - Both
  • 16. www.edureka.in/apache-storm Annie’s Answer Both
  • 17. www.edureka.in/apache-storm Annie’s Question It is not possible to run Storm process along with MapReduce jobs inside a Hadoop Cluster. - True - False
  • 18. www.edureka.in/apache-storm Annie’s Answer False. With Hadoop 2.0, it is possible.
  • 19. www.edureka.in/apache-storm Annie’s Question A Nimbus Node is similar to TaskTracker Node in Hadoop Cluster. - True - False
  • 20. www.edureka.in/apache-storm Annie’s Answer No. A Nimbus Node is more like a JobTracker Node in Hadoop
  • 21. www.edureka.in/apache-storm Annie’s Question A Storm topology is defined in terms of - Nimbus, Zookeeper, Supervisor nodes - Spout, Bolt - Spout, Bolt, Nimbus, Zookeeper, Supervisor nodes - Spout, Bolt, Zookeeper node
  • 22. www.edureka.in/apache-storm Annie’s Answer Spout and Bolt
  • 23. Questions? www.edureka.in/apache-stormTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Buy Complete Course at : www.edureka.in/apache-storm Batch Starts On: 16th Aug 08:30 PM , IST / 16th Aug 08:00 AM, PDT 13th Sep 7:00 AM, IST/ 12th Sep 06:30 PM, PDT Course Fee: USD 339 / INR (17795 + 12.36% Service tax)** For Existing edureka Customers (20% OFF) Price : USD 271/ INR 14326