Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Velocity 2015-final

1,747 views

Published on

Stream Processing and Anomaly Detection @Twitter

Published in: Technology
  • Be the first to comment

Velocity 2015-final

  1. 1. 1 Stream Processing and Anomaly DetectioN SAILESH MITTAL, KARTHIK RAMASAMY ARUN KEJARIWAL @saileshmittal, @karthikz, @arun_kejariwal
  2. 2. GOING REAL TIME [1]$ [2]$ [1]$h&p://www.adweek.com/news/technology/movie;studio;first;live;stream;trailer;premiere;twi&ers;periscope;163810$ [2]$h&p://qz.com/366433/american;airlines;is;playing;be&er;music;onboard;thanks;to;your;twi&er;complaints$$ [3]$h&p://www.theverge.com/2015/5/3/8539483/periscope;made;it;easy;to;watch;the;mayweather;pacquiao;fight;for;free$$ [3]$ 2
  3. 3. G Emerging break out trends in Twitter (in the form #hashtags) Ü Real time sports conversations related with a topic (recent goal or touchdown) ! Real time product recommendations based on your behavior & profile real time searchreal time trends real time conversations WHY REAL TIME? real time recommendations Real time search of tweets with a budget < 200 ms s 3
  4. 4. STREAMING ANALYTICS " [ I 4
  5. 5. ! E CUBE ANALYTICS Business Intelligence PREDICTIVE ANALYTICS Statistics and Machine learning TYPES OF ANALYTICS varieties 5
  6. 6. Ü Ability to provide insights after several hours/days when a query is posed REAL TIME BATCH DIMENSIONS OF ANALYTICS variants Ability to analyze the data instantly s 6
  7. 7. streaming Analyze data as it is being produced interactive Store data and provide results instantly when a query is posed H C REAL TIME ANALYTICS dichotomy 7
  8. 8. STREAMING VS. INTERACTIVE dichotomy Static Batch Results/Reports Database Server Data$ Storage$ Queries Bulkload Data INTERACTIVE ANALYTICS STREAMING ANALYTICS 8 Real time alerts, Real time analytics Continuous visibility Data$ Storage$ Results Queries Data Stream Processing
  9. 9. REAL TIME visibility WHAT IS REAL TIME? milli secs or secs or mins? approximate few secs BATCH adhoc queries high throughput few hours/days OLTP deterministic workflows latency sensitive < 500 ms 9
  10. 10. STREAMING SYSTEMS First generation - SQL NiagaraCQ Query Engine [Chen et al., SIGMOD 2000] STREAM: The Stanford Stream Data Manager [Arasu et al., SIGMOD 2003] Aurora: A Data Stream Management Engine [Abadi et al., SIGMOD 2003] The Design of the Borealis Stream Processing Engine [Abadi et al., CIDR 2005] Cayuga: A general purpose event monitoring system [Demers et al., CIDR 2007] 10
  11. 11. STREAMING SYSTEMS Next generation - too many 11
  12. 12. STORM " [ II 12
  13. 13. WHAT IS STORM? GUARANTEED MESSAGE PROCESSING HORIZONTAL SCALABILITY ROBUST FAULT TOLERANCE CONCISE CODE- FOCUS ON LOGIC /b Ñ Streaming platform for analyzing realtime data as they arrive, so you can react to data as it happens. 13
  14. 14. STORM DATA MODEL SPOUTS Sources of data for the topology (e.g) Postgres/My SQL/Kafka/Kestrel BOLTS Units of computation on data (e.g) filtering/aggregation/join/transformations# TOPOLOGY Directed acyclic graph - vertices = computation, edges = streams of data , , 14
  15. 15. WORD COUNT TOPOLOGY % % TWEET SPOUT PARSE TWEET BOLT WORD COUNT BOLT Live stream of Tweets #worldcup : 1M soccer: 400K …. LOGICAL PLAN 15
  16. 16. WORD COUNT TOPOLOGY % % TWEET SPOUT TASKS PARSE TWEET BOLT TASKS WORD COUNT BOLT TASKS %%%% %%%% When a parse tweet bolt task emits a tuple which word count bolt task should it send to? 16
  17. 17. Replicates tuples to next stage bolt instances Sends all the tuples to a single next stage bolt instance ALL GROUPING GLOBAL GROUPING STREAM GROUPINGS combining data Groups tuples by a single column value or multiple column values FIELDS GROUPING Randomly distributes tuples to next stage bolt instances SHUFFLE GROUPING / . - , 17
  18. 18. STORM INTERNALS " [ III 18
  19. 19. STORM ARCHITECTURE Nimbus ZK CLUSTER SUPERVISOR W1 W2 W3 W4 SUPERVISOR W1 W2 W3 W4 TOPOLOGY SUBMISSION ASSIGNMENT MAPS SLAVE NODE SLAVE NODE MASTER NODE Multiple Functionality Scheduling/Monitoring Single point of failure Storage Contention 19
  20. 20. STORM WORKER TASK TASK EXECUTOR TASK TASK TASK EXECUTOR JVMPROCESS Complex hierarchy Difficult to tune Hard to debug 20
  21. 21. DATA FLOW IN STORM WORKERS In QueueIn QueueIn QueueIn QueueIn Queue TCP Receive Buffer In QueueIn QueueIn QueueIn QueueOut Queue Outgoing Message Buffer User Logic Thread User Logic Thread User Logic Thread User Logic Thread User Logic Thread User Logic Thread User Logic Thread User Logic Thread User Logic ThreadSend Thread Global Send Thread TCP Send Buffer Global Receive Thread Kernel Disruptor Queues 0mq Queues Queue Contention Multiple Languages 21
  22. 22. STORM@TWITTER " [ IV 22
  23. 23. STORM @TWITTER Large amount of data produced every day Largest storm cluster Several topologies deployed Several billion messages every day >thousands l >50tb h > HUNDREDS P >3b b 1 stage 8 stages 23
  24. 24. 24 [1,$2]$ [1]$Published$in$SIGMOD’14$ [2]$h8ps://storm.apache.org/$$
  25. 25. 2 STORM METRICS CONTINUOUS PERFORMANCE CLUSTER AVAILABILITY 3 SUPPORT AND TROUBLE SHOOTING , 1 25
  26. 26. COLLECTING TOPOLOGY METRICS , % % TWEET SPOUT PARSE TWEET BOLT WORD COUNT BOLT %METRICS BOLT SCRIBE 26
  27. 27. SAMPLE TOPOLOGY DASHBOARD 27
  28. 28. 2 STORM OPERATIONS HOT KEYS NETWORK ISSUES 3 BAD HOST , 1 2828
  29. 29. ANOMALY DETECTION " [ V 29
  30. 30. PERFORMANCE BOTTLENECKS FAILURES Slow writes to data store, connectivity issues BACKPRESSURE CONTAINER DEATHSv REAL-TIME PROCESSING Tweets, Retweets , 4 impact and common symptoms #"ms"spent"under" Backpressure" 30 Ë
  31. 31. PERFORMANCE BOTTLENECKS HOT KEYS/CONNECTIVITY ISSUES ANOMALOUS NODES x SPIKE IN INPUT TRAFFIC , ⚡ Emit%Count% Kestrel'Spout'Lag' potential causes 31 ☀
  32. 32. FINDING “ANOMALOUS” NODES Example topology Large # of containers Several instances per containers Multiple metrics per instances 32
  33. 33. FINDING “ANOMALOUS” NODES distribution of # containers and # tasks 50 metrics per instance Address only top 5 key metrics 33
  34. 34. FINDING “ANOMALOUS” NODES STATISTICALLY ROBUST Minimize false positives FAST $ AUTOMATED Large # of topologies and large # of containers/topology , _ 34 !
  35. 35. FINDING “ANOMALOUS” NODES KEY FEATURES Filter/Expected values/Long term WIDELY USED OUTSIDE TWITTER R PACKAGE[1] : SEASONALITY AND TREND AWARE Employs time series decomposition and robust statistics , | 35 [1]$h&ps://blog.twi&er.com/2015/introducing=prac?cal=and=robust=anomaly=detec?on=in=a=?me=series$ & á
  36. 36. FINDING “ANOMALOUS” NODES LEVERAGE MULTIPLE METRICS Minimize false positives EXPLOIT CORRELATION/TOPOLOGY Observed variables[1] and latent variables R PACKAGE Applicable to univariate time series , I 36 [1]$"Automa,c$Failure$Diagnosis$in$Distributed$Large:Scale$So<ware$Systems$based$on$Timing$Behavior$Anomaly$Correla,on",$by$Marwede,$N.$S.,$Rohr,$M.,$van$Hoorn,$A.$and$Hasselbring,$W.$In$European$CSMR,$March$24::27,$2009.$ E #
  37. 37. FINDING “ANOMALOUS” NODES SERVICE COMPONENT HEALTH Determine the intersection of the set of anomalies of each instance HOST HEALTH Determine the intersection of the set of anomalies of each process , ' intersection analysis 37 '
  38. 38. FINDING “ANOMALOUS” NODES ANOMALY TYPE - INPUT SPIKE All metrics had sudden spikes ANOMALY TYPE - CONTAINER DEATH All metrics of instances on that container had drops , v intersection analysis - validation 38 Q
  39. 39. QUESTIONS and ANSWERS R ( Go ahead. Ask away. Give us your best shot. 39
  40. 40. YOU FOR LISTENING ) THANK 40

×