Real time analytics case study

  • 2,265 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,265
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
116
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Real Time Analytics – Big Data Case Study 1
  • 2. Agenda  Big Data  Real Time Analytics  Why is it needed?  Case Study – Telecom IndustryImpetus Confidential 2
  • 3. Big Data & HadoopImpetus Confidential 3
  • 4. What is Big Data?  Three dimensions of Big Data • Volume o Gathering/collecting over terabytes of information • Velocity o Analyzing million of trade events generated per day • Variety o Structured or unstructured data like text, sensor data, click streams, audio, video and log filesImpetus Confidential 4
  • 5. Big Data  Data is the key to Business, it could be used for • User behavior analysis • Ad targeting • Trending topics • Recommendations  How ? • Hadoop is the de-facto for batch processing data analytics o Provides parallel computation framework (Map Reduce) o Redundant, fault tolerant data storage o Designed to reliably store data using commodity machine o Designed keeping in mind hardware failures • Based on Google’s GFS and Map Reduce implementation • Real Time Analytics? - NOImpetus Confidential 5
  • 6. Real Time AnalyticsImpetus Confidential 6
  • 7. What is Real Time Analytics?  What is it? • Real-time analytics is a process of delivering information about events as they occur  Some Examples • Financial Industry - Fraud Detection, Trading • E-commerce - Recommendations • Telecom Industry - Machine to Machine communication • Supply Chain Management • Business Activity MonitoringImpetus Confidential 7
  • 8. Why is it needed ?  Time is money • Inter-day risk analysis in real time could translate into increased profits  Helps organizations to stay ahead of competition • E-commerce – throwing information based on what a user is browsing or interested in could help in better sales and experience • Content creator could produce relevant and quality contentImpetus Confidential 8
  • 9. Case Study – Telecommunication IndustryImpetus Confidential 9
  • 10. The Company, Challenge & Benefits Company Challenge • Telecom firm providing wireless network service designed to deliver • Design a Near Real Time solution Machine to Machine communications for predicting patterns based on data generated by Machine-to- to millions of device. Machine (M2M) communication and sent over wireless network. • Solution should be able to support addition of near real time streams without much of a change. Benefits • Enable customer to get real time • Enabled customers to react to their alerts for business critical critical business needs in real time. situations • Improved Customer Experience. • Reduced operating cost.Impetus Proprietary 10
  • 11. Examples  Machine to Machine Communication • Vineyards watering o Spread over huge area o Critical to maintain water level threshold • Vehicle Tracking & Geo-fencing o Mark the radius of vehicle movement (in case of valet parking)Impetus Confidential 11
  • 12. Incoming Data Attributes  Continuous input streams • Events as they happen  High data volume • 1000-100000 events per second  Varied sources • Data coming from multiple sourcesImpetus Confidential 12
  • 13. Expected Goals  Identify patterns • Devices sending incorrect /duplicate data  Reliability • Events are processed as they happen • Events are not missed in case of failure  Scalability • Should be able to support increase in volume  Capability to Add more Queries • Should be able to add more queries for a particular type of incoming stream  Notification / Alerts SystemImpetus Confidential 13
  • 14. Technology Stack – What all is needed?  Event Processing capability • Esper o Processing engine for data streams o SQL-Like Support – run queries on data stream o Sliding windows (time or length) o Pattern Matching o Executes large number of queries simultaneouslyImpetus Confidential 14
  • 15. Technology Stack – Esper  Esper - Simple steps to get started • Get an Esper instance • Create a statement (Esper Query Language) • Register the statement with esper engine • Create a Listner • Attach listener to the statementImpetus Confidential 15
  • 16. Technology Stack – Esper  Esper – Sample Queries  Time based window select avg(price) from StockTickEvent.win:time(30 sec)  Length based window select symbol, avg(price) as averagePrice from StockTickEvent.win:length(100) group by symbolImpetus Confidential 16
  • 17. Technology Stack - Storm  Data Carrier for Esper • Storm o Facilitates data transfer o Continuous Computation o Distributed, Fault tolerant o Scalable, No Data Loss o Provides parallelism o Acking & Replay capabilityImpetus Confidential 17
  • 18. Technology Stack - Storm  Basic concept of Storm • Streams, Spouts & Bolts • Stream is unbounded sequence of tuples • Spouts are data emitters, retrieving data from outside the Storm cluster • Bolts are data processors, receive one or more stream and emit (potentially) one or moreImpetus Confidential 18
  • 19. Technology Stack - Storm  Storm Cluster • Topology - A graph of spouts and bolts that are connected with stream groupings • Master Node – Runs daemon called Nimbus o Distributes code across cluster o Assign tasks to machines o Monitor failure • Worker Node - Runs daemon called Supervisor o Listens for work assigned o Start/Stop worker process o Executes subset of topology • Coordination between nimbus and supervisor is done with ZookeeperImpetus Confidential 19
  • 20. Technology Stack - Flume  Log Data Collection • Flume o Stream oriented data flow o Log streaming from various sources o Collect, aggregate & move data to centralized data store o Distributed, Reliable o Failover and recovery mechanismImpetus Confidential 20
  • 21. Technology Stack - Flume  Flume • Agent - Receives data from an application • Collector – Writes data on to a permanent storage • Master – Separate service controlling all the other nodesImpetus Confidential 21
  • 22. Technology Stack - Messaging  Bridging the gap between Flume & Storm • Queue Messaging System o Robust messaging o Flexible routing o Highly available o Makes Flume & Storm integration loosely coupled • RabbitMQ fits the requirementImpetus Confidential 22
  • 23. Fitting it all together Data Center 23
  • 24. References  Esper  http://esper.codehaus.org/  Storm  https://github.com/nathanmarz/storm  https://github.com/tomdz/storm-esper  Flume  http://archive.cloudera.com/cdh/3/flume/UserGuide/#_architecture  Queue Messaging System  http://www.rabbitmq.com/Impetus Confidential 24
  • 25. Thank You