Your SlideShare is downloading. ×
Real time analytics   case study
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Real time analytics case study


Published on

Published in: Technology, Business

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Real Time Analytics – Big Data Case Study 1
  • 2. Agenda  Big Data  Real Time Analytics  Why is it needed?  Case Study – Telecom IndustryImpetus Confidential 2
  • 3. Big Data & HadoopImpetus Confidential 3
  • 4. What is Big Data?  Three dimensions of Big Data • Volume o Gathering/collecting over terabytes of information • Velocity o Analyzing million of trade events generated per day • Variety o Structured or unstructured data like text, sensor data, click streams, audio, video and log filesImpetus Confidential 4
  • 5. Big Data  Data is the key to Business, it could be used for • User behavior analysis • Ad targeting • Trending topics • Recommendations  How ? • Hadoop is the de-facto for batch processing data analytics o Provides parallel computation framework (Map Reduce) o Redundant, fault tolerant data storage o Designed to reliably store data using commodity machine o Designed keeping in mind hardware failures • Based on Google’s GFS and Map Reduce implementation • Real Time Analytics? - NOImpetus Confidential 5
  • 6. Real Time AnalyticsImpetus Confidential 6
  • 7. What is Real Time Analytics?  What is it? • Real-time analytics is a process of delivering information about events as they occur  Some Examples • Financial Industry - Fraud Detection, Trading • E-commerce - Recommendations • Telecom Industry - Machine to Machine communication • Supply Chain Management • Business Activity MonitoringImpetus Confidential 7
  • 8. Why is it needed ?  Time is money • Inter-day risk analysis in real time could translate into increased profits  Helps organizations to stay ahead of competition • E-commerce – throwing information based on what a user is browsing or interested in could help in better sales and experience • Content creator could produce relevant and quality contentImpetus Confidential 8
  • 9. Case Study – Telecommunication IndustryImpetus Confidential 9
  • 10. The Company, Challenge & Benefits Company Challenge • Telecom firm providing wireless network service designed to deliver • Design a Near Real Time solution Machine to Machine communications for predicting patterns based on data generated by Machine-to- to millions of device. Machine (M2M) communication and sent over wireless network. • Solution should be able to support addition of near real time streams without much of a change. Benefits • Enable customer to get real time • Enabled customers to react to their alerts for business critical critical business needs in real time. situations • Improved Customer Experience. • Reduced operating cost.Impetus Proprietary 10
  • 11. Examples  Machine to Machine Communication • Vineyards watering o Spread over huge area o Critical to maintain water level threshold • Vehicle Tracking & Geo-fencing o Mark the radius of vehicle movement (in case of valet parking)Impetus Confidential 11
  • 12. Incoming Data Attributes  Continuous input streams • Events as they happen  High data volume • 1000-100000 events per second  Varied sources • Data coming from multiple sourcesImpetus Confidential 12
  • 13. Expected Goals  Identify patterns • Devices sending incorrect /duplicate data  Reliability • Events are processed as they happen • Events are not missed in case of failure  Scalability • Should be able to support increase in volume  Capability to Add more Queries • Should be able to add more queries for a particular type of incoming stream  Notification / Alerts SystemImpetus Confidential 13
  • 14. Technology Stack – What all is needed?  Event Processing capability • Esper o Processing engine for data streams o SQL-Like Support – run queries on data stream o Sliding windows (time or length) o Pattern Matching o Executes large number of queries simultaneouslyImpetus Confidential 14
  • 15. Technology Stack – Esper  Esper - Simple steps to get started • Get an Esper instance • Create a statement (Esper Query Language) • Register the statement with esper engine • Create a Listner • Attach listener to the statementImpetus Confidential 15
  • 16. Technology Stack – Esper  Esper – Sample Queries  Time based window select avg(price) from sec)  Length based window select symbol, avg(price) as averagePrice from group by symbolImpetus Confidential 16
  • 17. Technology Stack - Storm  Data Carrier for Esper • Storm o Facilitates data transfer o Continuous Computation o Distributed, Fault tolerant o Scalable, No Data Loss o Provides parallelism o Acking & Replay capabilityImpetus Confidential 17
  • 18. Technology Stack - Storm  Basic concept of Storm • Streams, Spouts & Bolts • Stream is unbounded sequence of tuples • Spouts are data emitters, retrieving data from outside the Storm cluster • Bolts are data processors, receive one or more stream and emit (potentially) one or moreImpetus Confidential 18
  • 19. Technology Stack - Storm  Storm Cluster • Topology - A graph of spouts and bolts that are connected with stream groupings • Master Node – Runs daemon called Nimbus o Distributes code across cluster o Assign tasks to machines o Monitor failure • Worker Node - Runs daemon called Supervisor o Listens for work assigned o Start/Stop worker process o Executes subset of topology • Coordination between nimbus and supervisor is done with ZookeeperImpetus Confidential 19
  • 20. Technology Stack - Flume  Log Data Collection • Flume o Stream oriented data flow o Log streaming from various sources o Collect, aggregate & move data to centralized data store o Distributed, Reliable o Failover and recovery mechanismImpetus Confidential 20
  • 21. Technology Stack - Flume  Flume • Agent - Receives data from an application • Collector – Writes data on to a permanent storage • Master – Separate service controlling all the other nodesImpetus Confidential 21
  • 22. Technology Stack - Messaging  Bridging the gap between Flume & Storm • Queue Messaging System o Robust messaging o Flexible routing o Highly available o Makes Flume & Storm integration loosely coupled • RabbitMQ fits the requirementImpetus Confidential 22
  • 23. Fitting it all together Data Center 23
  • 24. References  Esper   Storm    Flume   Queue Messaging System  Confidential 24
  • 25. Thank You