Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Monitoring and Troubleshooting a Real Time Pipeline

494 views

Published on

Alan Ngai, CTO/Co-Founder, OpsClarity

OpsClarity is a performance monitoring solution for stream processing applications. In additional to providing deep component monitoring it leverages data science to proactively identify anomalies across the entire data pipeline and correlates issues across the data and app tier to identify common concerns that impact business. OpsClarity automatically discovers the entire app and data topology and is years ahead of anything else in how it leverages the rich meta-data and network dependency context captured through the topology to provide rich analysis and fastest correlated troubleshooting. This talk will additionally cover integration with Apache Apex.

Published in: Technology
  • Be the first to comment

Monitoring and Troubleshooting a Real Time Pipeline

  1. 1. Monitoring and Troubleshooting a Real Time Pipeline Alan Ngai, CTO/Co-Founder, OpsClarity
  2. 2. Businesses are Turning to Data-First Applications AD Network – Real-time bidding DDoS Attack Prevention Fraud Detection Internet of Things Financial Services Real-time Personalization
  3. 3. Data-First Application: Many Moving Parts! DATA SOURCE MESSAGE BROKER STREAM PROCESSOR DATA SINK APPLICATIONS DATA PIPELINE ELASTIC INFRASTRUCTURE BUSINESS LOGIC AS MICROSERVICES CODE
  4. 4. OpsClarity Runs on Data Pipelines
  5. 5. Characteristics of Data Pipelines • Heterogeneous Components
  6. 6. Characteristics of Data Pipelines • Heterogeneous Components • Extremely Complex Storm Master Host Storm Worker Host Supervisor Process Topology Executor Spout Task Bolt Task Bolt Task Bolt Task METRIC STORM
  7. 7. Characteristics of Data Pipelines • Heterogeneous Components • Highly Complex • Highly Inter-dependent
  8. 8. Characteristics of Data Pipelines • Heterogeneous Components • Highly Interdependent • Highly Complex • Painful to Monitor and Debug
  9. 9. Put Data In One Place (don’t rely on this) Kafka Web Console Spark UI Marvel (Elasticsearch) Ambari (Hadoop) Ganglia Nagios
  10. 10. Organize Your Concerns Horizontally • Throughput • Latency • Error Rate • Buffered • Data Loss • Duplication stuff per unit of time how long it takes to process stuff how frequently bad stuff happens how much stuff is piled up how much stuff is being lost How much stuff is being duplicated Matters for all stages in a pipeline! Matters for all business use cases too!
  11. 11. Organize Your Concerns Horizontally • Throughput • Latency • Error Rate • Buffered • Data Loss • Duplication
  12. 12. …And Also Vertically Storm Master Host Storm Worker Host Supervisor Process Topology Executor Spout Task Bolt Task Bolt Task Bolt Task METRIC STORM
  13. 13. …And Also Vertically Data Health Dependency Health Service Health Application Job/Topology Health Node Service Health Node System Health throughput, latency, errors? Are Kafka and Zookeeper healthy? Is the Storm Master healthy? Are there adequate resources in the cluster? Are my application KPI’s within normal range? Is my Job well distributed in the cluster? Are job counters normal? Are all jobs running on this node normal? Are key system metrics (cpu, mem, network, disk i/o) normal? Data Health Dependency Health Service Health Application Job/Topology Health Node Service Health Node System Health
  14. 14. DEMO
  15. 15. What We Talked About • Data-First Applications Are Becoming a Thing • Monitoring Data-First Applications is Hard! • Get Your Metrics In One Place • Organize Your Data Horizontally and Vertically
  16. 16. Questions? Alan Ngai alan@opsclarity.com

×