Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OSMC 2018 | Monitoring Distributed Systems by Philip Griesbacher

54 views

Published on

From an external observer’s perspective components of distributed systems are starting and terminating in an unpredictable manner, which makes the monitoring challenging. Components can also start multiple times on a single server as well as on multiple machines. The Hadoop ecosystem is one example for such a distributed application and the primary example of this talk. The fundamental question to be addressed is: How can such unpredictable distributed systems be monitored? This talk presents a general analysis of the problem and its existing solutions. Based on this analysis, a new theoretical concept is developed and realized in a practical solution. A fully automated monitoring solution for distributed systems will be demonstrated. The solution is flexible and portable and can therefore be applied also outside the Hadoop environment. This new solution is an efficient and promising contribution to the monitoring community.

Published in: Software
  • Be the first to comment

  • Be the first to like this

OSMC 2018 | Monitoring Distributed Systems by Philip Griesbacher

  1. 1. MONITORING DISTRIBUTED SYSTEMS PHILIP GRIESBACHER @ OSMC 2018
  2. 2. Page 2MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC 1 WHO AM I? AND WHERE DO I COME FROM?
  3. 3. Page 3MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC ABOUT ME AND THIS TALK  Past: 5years at ConSol‘s Monitoring Team  Finished my Master of Science inthe field of Computer Science - Specialization: Distributed Systems - Master Thesis: Concepts for Monitoring Distributed Systems  Now: BigData, Machine Learning, Artificial Intelligence Department at BMW as System Architect for Cloud Native Topics
  4. 4. Page 4MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC THE BMW GROUP IN NUMBERS (2017) 129.900 Employees worldwide 2.463.526 Sold cars worldwide
  5. 5. Page 5MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC BIGDATA, MACHINE LEARNING, ARTIFICIAL INTELLIGENCE @ BMW GROUP  Storage: - ~ 3 PB Hadoop - ~ 2 PB Streaming and other  Memory: - ~ 25TB Hadoop - ~ 25TB Streaming and other  ~ 1TB data growth per day
  6. 6. Page 6MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC 2 INTRODUCTION
  7. 7. Page 7MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC INTRODUCTION DISTRIBUTED SYSTEMS “A distributed system is a collection of independent computers that appearsto its users as a single coherent system.“ -Tanenbaum & Steen (2006)
  8. 8. Page 8MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC INTRODUCTION DISTRIBUTED SYSTEMS Hadoop Ecosystem  YARN  HDFS  MapReduce  (Spark)
  9. 9. Page 9MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC INTRODUCTION RESEARCH QUESTION Howto monitor distributed systems?
  10. 10. Page 10MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC 3 ANALYSIS OF MONITORING CONCEPTS AND SYSTEMS
  11. 11. Page 11MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC ANALYSIS OF MONITORING CONCEPTS AND SYSTEMS ANALYSIS OF MONITORING CONCEPTS  Push vs. Pull  Blackbox vs. Whitebox  Agent-based vs. Agent-less
  12. 12. Page 12MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC ANALYSIS OF MONITORING CONCEPTS AND SYSTEMS REQUIREMENTS ON MONITORING SYSTEMS  Scalability  Robustness  Extensibility  Manageability / Administratively Scalable  Portability  Overhead  Security
  13. 13. Page 13MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC ANALYSIS OF MONITORING CONCEPTS AND SYSTEMS CONCLUSION Ganglia Nagios Prometheus Pro - Manageability / Administratively Scalable - Extensibility - Robustness - Portability - Overhead Contra - Geographical Scaling - Security - Overhead - Manageability / Administratively Scalable → Nagios + Prometheus
  14. 14. Page 14MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC 4 MONITORING SOLUTION
  15. 15. Page 15MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION SHOWCASE ENVIRONMENT 4 (ResourceManagers and NameNodes) + 252 * 2 (DataNodes and NodeManagers) + 252 * 3 (ApplicationMaster, Map and Reduce) = 1264 processes at minimum.
  16. 16. Page 16MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION CHALLENGE “If a human operator needstotouch your system during normal operations, you have a bug.“ - Carla Geisser, Google
  17. 17. Page 17MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION APPLICATIONTYPES Static Applications:  ResourceManagers  NameNodes  DataNodes  NodeManagers Dynamic Applications:  ApplicationMaster  Map  Reduce
  18. 18. Page 18MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION SERVICE DISCOVERY - BACKGROUND Container1 Container2 Container3
  19. 19. Page 19MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION SERVICE DISCOVERY - BACKGROUND Container1 Container2 Container3 Registry 1
  20. 20. Page 20MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION SERVICE DISCOVERY - BACKGROUND Container1 Container2 Container3 Client Registry 1 2
  21. 21. Page 21MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION SERVICE DISCOVERY - BACKGROUND Container1 Container2 Container3 Client Registry 1 2 3
  22. 22. Page 22MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION IDEA
  23. 23. Page 23MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION JOCOSE
  24. 24. Page 24MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION PCAP_EXPORTER Network-traffic exporter Filter:  Source and Destination - Port - IP  Protocol
  25. 25. Page 25MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION GRAFANA GRAPH PANEL
  26. 26. Page 26MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION GRAFANA NETWORK GRAPH PANEL
  27. 27. Page 27MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC 5 EVALUATION AND CONCLUSION
  28. 28. Page 28MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC EVALUATION UTILIZATION - RESOURCE MONETIZATION
  29. 29. Page 29MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC CONCLUSION  Fully automated monitoring  Flexible and portable concept  Scales withthe system to monitor
  30. 30. Page 30MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC 6 DEMO
  31. 31. Page 31MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC 7 Q & A

×