Your SlideShare is downloading. ×
Real time analytics   case study
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Real time analytics case study


Published on

Published in: Technology, Business

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Real Time Analytics – Big Data Case Study 1
  • 2. Agenda  Big Data  Real Time Analytics  Why is it needed?  Case Study – Telecom IndustryImpetus Confidential 2
  • 3. Big Data & HadoopImpetus Confidential 3
  • 4. What is Big Data?  Three dimensions of Big Data • Volume o Gathering/collecting over terabytes of information • Velocity o Analyzing million of trade events generated per day • Variety o Structured or unstructured data like text, sensor data, click streams, audio, video and log filesImpetus Confidential 4
  • 5. Big Data  Data is the key to Business, it could be used for • User behavior analysis • Ad targeting • Trending topics • Recommendations  How ? • Hadoop is the de-facto for batch processing data analytics o Provides parallel computation framework (Map Reduce) o Redundant, fault tolerant data storage o Designed to reliably store data using commodity machine o Designed keeping in mind hardware failures • Based on Google’s GFS and Map Reduce implementation • Real Time Analytics? - NOImpetus Confidential 5
  • 6. Real Time AnalyticsImpetus Confidential 6
  • 7. What is Real Time Analytics?  What is it? • Real-time analytics is a process of delivering information about events as they occur  Some Examples • Financial Industry - Fraud Detection, Trading • E-commerce - Recommendations • Telecom Industry - Machine to Machine communication • Supply Chain Management • Business Activity MonitoringImpetus Confidential 7
  • 8. Why is it needed ?  Time is money • Inter-day risk analysis in real time could translate into increased profits  Helps organizations to stay ahead of competition • E-commerce – throwing information based on what a user is browsing or interested in could help in better sales and experience • Content creator could produce relevant and quality contentImpetus Confidential 8
  • 9. Case Study – Telecommunication IndustryImpetus Confidential 9
  • 10. The Company, Challenge & Benefits Company Challenge • Telecom firm providing wireless network service designed to deliver • Design a Near Real Time solution Machine to Machine communications for predicting patterns based on data generated by Machine-to- to millions of device. Machine (M2M) communication and sent over wireless network. • Solution should be able to support addition of near real time streams without much of a change. Benefits • Enable customer to get real time • Enabled customers to react to their alerts for business critical critical business needs in real time. situations • Improved Customer Experience. • Reduced operating cost.Impetus Proprietary 10
  • 11. Examples  Machine to Machine Communication • Vineyards watering o Spread over huge area o Critical to maintain water level threshold • Vehicle Tracking & Geo-fencing o Mark the radius of vehicle movement (in case of valet parking)Impetus Confidential 11
  • 12. Incoming Data Attributes  Continuous input streams • Events as they happen  High data volume • 1000-100000 events per second  Varied sources • Data coming from multiple sourcesImpetus Confidential 12
  • 13. Expected Goals  Identify patterns • Devices sending incorrect /duplicate data  Reliability • Events are processed as they happen • Events are not missed in case of failure  Scalability • Should be able to support increase in volume  Capability to Add more Queries • Should be able to add more queries for a particular type of incoming stream  Notification / Alerts SystemImpetus Confidential 13
  • 14. Technology Stack – What all is needed?  Event Processing capability • Esper o Processing engine for data streams o SQL-Like Support – run queries on data stream o Sliding windows (time or length) o Pattern Matching o Executes large number of queries simultaneouslyImpetus Confidential 14
  • 15. Technology Stack – Esper  Esper - Simple steps to get started • Get an Esper instance • Create a statement (Esper Query Language) • Register the statement with esper engine • Create a Listner • Attach listener to the statementImpetus Confidential 15
  • 16. Technology Stack – Esper  Esper – Sample Queries  Time based window select avg(price) from sec)  Length based window select symbol, avg(price) as averagePrice from group by symbolImpetus Confidential 16
  • 17. Technology Stack - Storm  Data Carrier for Esper • Storm o Facilitates data transfer o Continuous Computation o Distributed, Fault tolerant o Scalable, No Data Loss o Provides parallelism o Acking & Replay capabilityImpetus Confidential 17
  • 18. Technology Stack - Storm  Basic concept of Storm • Streams, Spouts & Bolts • Stream is unbounded sequence of tuples • Spouts are data emitters, retrieving data from outside the Storm cluster • Bolts are data processors, receive one or more stream and emit (potentially) one or moreImpetus Confidential 18
  • 19. Technology Stack - Storm  Storm Cluster • Topology - A graph of spouts and bolts that are connected with stream groupings • Master Node – Runs daemon called Nimbus o Distributes code across cluster o Assign tasks to machines o Monitor failure • Worker Node - Runs daemon called Supervisor o Listens for work assigned o Start/Stop worker process o Executes subset of topology • Coordination between nimbus and supervisor is done with ZookeeperImpetus Confidential 19
  • 20. Technology Stack - Flume  Log Data Collection • Flume o Stream oriented data flow o Log streaming from various sources o Collect, aggregate & move data to centralized data store o Distributed, Reliable o Failover and recovery mechanismImpetus Confidential 20
  • 21. Technology Stack - Flume  Flume • Agent - Receives data from an application • Collector – Writes data on to a permanent storage • Master – Separate service controlling all the other nodesImpetus Confidential 21
  • 22. Technology Stack - Messaging  Bridging the gap between Flume & Storm • Queue Messaging System o Robust messaging o Flexible routing o Highly available o Makes Flume & Storm integration loosely coupled • RabbitMQ fits the requirementImpetus Confidential 22
  • 23. Fitting it all together Data Center 23
  • 24. References  Esper   Storm    Flume   Queue Messaging System  Confidential 24
  • 25. Thank You