Real time analytics   case study
Upcoming SlideShare
Loading in...5
×
 

Real time analytics case study

on

  • 2,695 views

 

Statistics

Views

Total Views
2,695
Slideshare-icon Views on SlideShare
2,141
Embed Views
554

Actions

Likes
4
Downloads
110
Comments
0

4 Embeds 554

http://www.nasscom.in 291
http://www.nasscom.org 228
http://www.scoop.it 34
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Real time analytics   case study Real time analytics case study Presentation Transcript

    • Real Time Analytics – Big Data Case Study 1
    • Agenda  Big Data  Real Time Analytics  Why is it needed?  Case Study – Telecom IndustryImpetus Confidential 2
    • Big Data & HadoopImpetus Confidential 3
    • What is Big Data?  Three dimensions of Big Data • Volume o Gathering/collecting over terabytes of information • Velocity o Analyzing million of trade events generated per day • Variety o Structured or unstructured data like text, sensor data, click streams, audio, video and log filesImpetus Confidential 4
    • Big Data  Data is the key to Business, it could be used for • User behavior analysis • Ad targeting • Trending topics • Recommendations  How ? • Hadoop is the de-facto for batch processing data analytics o Provides parallel computation framework (Map Reduce) o Redundant, fault tolerant data storage o Designed to reliably store data using commodity machine o Designed keeping in mind hardware failures • Based on Google’s GFS and Map Reduce implementation • Real Time Analytics? - NOImpetus Confidential 5
    • Real Time AnalyticsImpetus Confidential 6
    • What is Real Time Analytics?  What is it? • Real-time analytics is a process of delivering information about events as they occur  Some Examples • Financial Industry - Fraud Detection, Trading • E-commerce - Recommendations • Telecom Industry - Machine to Machine communication • Supply Chain Management • Business Activity MonitoringImpetus Confidential 7
    • Why is it needed ?  Time is money • Inter-day risk analysis in real time could translate into increased profits  Helps organizations to stay ahead of competition • E-commerce – throwing information based on what a user is browsing or interested in could help in better sales and experience • Content creator could produce relevant and quality contentImpetus Confidential 8
    • Case Study – Telecommunication IndustryImpetus Confidential 9
    • The Company, Challenge & Benefits Company Challenge • Telecom firm providing wireless network service designed to deliver • Design a Near Real Time solution Machine to Machine communications for predicting patterns based on data generated by Machine-to- to millions of device. Machine (M2M) communication and sent over wireless network. • Solution should be able to support addition of near real time streams without much of a change. Benefits • Enable customer to get real time • Enabled customers to react to their alerts for business critical critical business needs in real time. situations • Improved Customer Experience. • Reduced operating cost.Impetus Proprietary 10
    • Examples  Machine to Machine Communication • Vineyards watering o Spread over huge area o Critical to maintain water level threshold • Vehicle Tracking & Geo-fencing o Mark the radius of vehicle movement (in case of valet parking)Impetus Confidential 11
    • Incoming Data Attributes  Continuous input streams • Events as they happen  High data volume • 1000-100000 events per second  Varied sources • Data coming from multiple sourcesImpetus Confidential 12
    • Expected Goals  Identify patterns • Devices sending incorrect /duplicate data  Reliability • Events are processed as they happen • Events are not missed in case of failure  Scalability • Should be able to support increase in volume  Capability to Add more Queries • Should be able to add more queries for a particular type of incoming stream  Notification / Alerts SystemImpetus Confidential 13
    • Technology Stack – What all is needed?  Event Processing capability • Esper o Processing engine for data streams o SQL-Like Support – run queries on data stream o Sliding windows (time or length) o Pattern Matching o Executes large number of queries simultaneouslyImpetus Confidential 14
    • Technology Stack – Esper  Esper - Simple steps to get started • Get an Esper instance • Create a statement (Esper Query Language) • Register the statement with esper engine • Create a Listner • Attach listener to the statementImpetus Confidential 15
    • Technology Stack – Esper  Esper – Sample Queries  Time based window select avg(price) from StockTickEvent.win:time(30 sec)  Length based window select symbol, avg(price) as averagePrice from StockTickEvent.win:length(100) group by symbolImpetus Confidential 16
    • Technology Stack - Storm  Data Carrier for Esper • Storm o Facilitates data transfer o Continuous Computation o Distributed, Fault tolerant o Scalable, No Data Loss o Provides parallelism o Acking & Replay capabilityImpetus Confidential 17
    • Technology Stack - Storm  Basic concept of Storm • Streams, Spouts & Bolts • Stream is unbounded sequence of tuples • Spouts are data emitters, retrieving data from outside the Storm cluster • Bolts are data processors, receive one or more stream and emit (potentially) one or moreImpetus Confidential 18
    • Technology Stack - Storm  Storm Cluster • Topology - A graph of spouts and bolts that are connected with stream groupings • Master Node – Runs daemon called Nimbus o Distributes code across cluster o Assign tasks to machines o Monitor failure • Worker Node - Runs daemon called Supervisor o Listens for work assigned o Start/Stop worker process o Executes subset of topology • Coordination between nimbus and supervisor is done with ZookeeperImpetus Confidential 19
    • Technology Stack - Flume  Log Data Collection • Flume o Stream oriented data flow o Log streaming from various sources o Collect, aggregate & move data to centralized data store o Distributed, Reliable o Failover and recovery mechanismImpetus Confidential 20
    • Technology Stack - Flume  Flume • Agent - Receives data from an application • Collector – Writes data on to a permanent storage • Master – Separate service controlling all the other nodesImpetus Confidential 21
    • Technology Stack - Messaging  Bridging the gap between Flume & Storm • Queue Messaging System o Robust messaging o Flexible routing o Highly available o Makes Flume & Storm integration loosely coupled • RabbitMQ fits the requirementImpetus Confidential 22
    • Fitting it all together Data Center 23
    • References  Esper  http://esper.codehaus.org/  Storm  https://github.com/nathanmarz/storm  https://github.com/tomdz/storm-esper  Flume  http://archive.cloudera.com/cdh/3/flume/UserGuide/#_architecture  Queue Messaging System  http://www.rabbitmq.com/Impetus Confidential 24
    • Thank You