Real Time Analytics – Big Data          Case Study                                 1
Agenda             Big Data             Real Time Analytics             Why is it needed?             Case Study – Tel...
Big Data & HadoopImpetus Confidential                       3
What is Big Data?             Three dimensions of Big Data              •        Volume                       o    Gather...
Big Data             Data is the key to Business, it could be used for              •        User behavior analysis      ...
Real Time AnalyticsImpetus Confidential                         6
What is Real Time Analytics?             What is it?              •        Real-time analytics is a process of delivering...
Why is it needed ?             Time is money              •        Inter-day risk analysis in real time could translate i...
Case Study –        Telecommunication IndustryImpetus Confidential                 9
The Company, Challenge &                  Benefits    Company                                    Challenge     • Telecom f...
Examples             Machine to Machine Communication              •        Vineyards watering                       o   ...
Incoming Data Attributes             Continuous input streams              •        Events as they happen             Hi...
Expected Goals             Identify patterns              •        Devices sending incorrect /duplicate data            ...
Technology Stack –                       What all is needed?             Event Processing capability              •      ...
Technology Stack – Esper             Esper - Simple steps to get started              •        Get an Esper instance     ...
Technology Stack – Esper             Esper – Sample Queries                      Time based window              select a...
Technology Stack - Storm             Data Carrier for Esper              •        Storm                       o   Facilit...
Technology Stack - Storm            Basic concept of Storm             •         Streams, Spouts & Bolts             •   ...
Technology Stack - Storm          Storm Cluster           •      Topology - A graph of spouts and bolts                  ...
Technology Stack - Flume           Log Data Collection            •      Flume                   o   Stream oriented data...
Technology Stack - Flume           Flume            •      Agent - Receives data from                   an application   ...
Technology Stack - Messaging             Bridging the gap between Flume & Storm              •        Queue Messaging Sys...
Fitting it all together        Data Center                          23
References             Esper                      http://esper.codehaus.org/             Storm                      ht...
Thank You
Upcoming SlideShare
Loading in...5
×

Real time analytics case study

2,709

Published on

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,709
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
141
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Real time analytics case study

  1. 1. Real Time Analytics – Big Data Case Study 1
  2. 2. Agenda  Big Data  Real Time Analytics  Why is it needed?  Case Study – Telecom IndustryImpetus Confidential 2
  3. 3. Big Data & HadoopImpetus Confidential 3
  4. 4. What is Big Data?  Three dimensions of Big Data • Volume o Gathering/collecting over terabytes of information • Velocity o Analyzing million of trade events generated per day • Variety o Structured or unstructured data like text, sensor data, click streams, audio, video and log filesImpetus Confidential 4
  5. 5. Big Data  Data is the key to Business, it could be used for • User behavior analysis • Ad targeting • Trending topics • Recommendations  How ? • Hadoop is the de-facto for batch processing data analytics o Provides parallel computation framework (Map Reduce) o Redundant, fault tolerant data storage o Designed to reliably store data using commodity machine o Designed keeping in mind hardware failures • Based on Google’s GFS and Map Reduce implementation • Real Time Analytics? - NOImpetus Confidential 5
  6. 6. Real Time AnalyticsImpetus Confidential 6
  7. 7. What is Real Time Analytics?  What is it? • Real-time analytics is a process of delivering information about events as they occur  Some Examples • Financial Industry - Fraud Detection, Trading • E-commerce - Recommendations • Telecom Industry - Machine to Machine communication • Supply Chain Management • Business Activity MonitoringImpetus Confidential 7
  8. 8. Why is it needed ?  Time is money • Inter-day risk analysis in real time could translate into increased profits  Helps organizations to stay ahead of competition • E-commerce – throwing information based on what a user is browsing or interested in could help in better sales and experience • Content creator could produce relevant and quality contentImpetus Confidential 8
  9. 9. Case Study – Telecommunication IndustryImpetus Confidential 9
  10. 10. The Company, Challenge & Benefits Company Challenge • Telecom firm providing wireless network service designed to deliver • Design a Near Real Time solution Machine to Machine communications for predicting patterns based on data generated by Machine-to- to millions of device. Machine (M2M) communication and sent over wireless network. • Solution should be able to support addition of near real time streams without much of a change. Benefits • Enable customer to get real time • Enabled customers to react to their alerts for business critical critical business needs in real time. situations • Improved Customer Experience. • Reduced operating cost.Impetus Proprietary 10
  11. 11. Examples  Machine to Machine Communication • Vineyards watering o Spread over huge area o Critical to maintain water level threshold • Vehicle Tracking & Geo-fencing o Mark the radius of vehicle movement (in case of valet parking)Impetus Confidential 11
  12. 12. Incoming Data Attributes  Continuous input streams • Events as they happen  High data volume • 1000-100000 events per second  Varied sources • Data coming from multiple sourcesImpetus Confidential 12
  13. 13. Expected Goals  Identify patterns • Devices sending incorrect /duplicate data  Reliability • Events are processed as they happen • Events are not missed in case of failure  Scalability • Should be able to support increase in volume  Capability to Add more Queries • Should be able to add more queries for a particular type of incoming stream  Notification / Alerts SystemImpetus Confidential 13
  14. 14. Technology Stack – What all is needed?  Event Processing capability • Esper o Processing engine for data streams o SQL-Like Support – run queries on data stream o Sliding windows (time or length) o Pattern Matching o Executes large number of queries simultaneouslyImpetus Confidential 14
  15. 15. Technology Stack – Esper  Esper - Simple steps to get started • Get an Esper instance • Create a statement (Esper Query Language) • Register the statement with esper engine • Create a Listner • Attach listener to the statementImpetus Confidential 15
  16. 16. Technology Stack – Esper  Esper – Sample Queries  Time based window select avg(price) from StockTickEvent.win:time(30 sec)  Length based window select symbol, avg(price) as averagePrice from StockTickEvent.win:length(100) group by symbolImpetus Confidential 16
  17. 17. Technology Stack - Storm  Data Carrier for Esper • Storm o Facilitates data transfer o Continuous Computation o Distributed, Fault tolerant o Scalable, No Data Loss o Provides parallelism o Acking & Replay capabilityImpetus Confidential 17
  18. 18. Technology Stack - Storm  Basic concept of Storm • Streams, Spouts & Bolts • Stream is unbounded sequence of tuples • Spouts are data emitters, retrieving data from outside the Storm cluster • Bolts are data processors, receive one or more stream and emit (potentially) one or moreImpetus Confidential 18
  19. 19. Technology Stack - Storm  Storm Cluster • Topology - A graph of spouts and bolts that are connected with stream groupings • Master Node – Runs daemon called Nimbus o Distributes code across cluster o Assign tasks to machines o Monitor failure • Worker Node - Runs daemon called Supervisor o Listens for work assigned o Start/Stop worker process o Executes subset of topology • Coordination between nimbus and supervisor is done with ZookeeperImpetus Confidential 19
  20. 20. Technology Stack - Flume  Log Data Collection • Flume o Stream oriented data flow o Log streaming from various sources o Collect, aggregate & move data to centralized data store o Distributed, Reliable o Failover and recovery mechanismImpetus Confidential 20
  21. 21. Technology Stack - Flume  Flume • Agent - Receives data from an application • Collector – Writes data on to a permanent storage • Master – Separate service controlling all the other nodesImpetus Confidential 21
  22. 22. Technology Stack - Messaging  Bridging the gap between Flume & Storm • Queue Messaging System o Robust messaging o Flexible routing o Highly available o Makes Flume & Storm integration loosely coupled • RabbitMQ fits the requirementImpetus Confidential 22
  23. 23. Fitting it all together Data Center 23
  24. 24. References  Esper  http://esper.codehaus.org/  Storm  https://github.com/nathanmarz/storm  https://github.com/tomdz/storm-esper  Flume  http://archive.cloudera.com/cdh/3/flume/UserGuide/#_architecture  Queue Messaging System  http://www.rabbitmq.com/Impetus Confidential 24
  25. 25. Thank You
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×