OSMC 2018 | Monitoring Distributed Systems by Philip Griesbacher

•

0 likes•153 views

From an external observer’s perspective components of distributed systems are starting and terminating in an unpredictable manner, which makes the monitoring challenging. Components can also start multiple times on a single server as well as on multiple machines. The Hadoop ecosystem is one example for such a distributed application and the primary example of this talk. The fundamental question to be addressed is: How can such unpredictable distributed systems be monitored? This talk presents a general analysis of the problem and its existing solutions. Based on this analysis, a new theoretical concept is developed and realized in a practical solution. A fully automated monitoring solution for distributed systems will be demonstrated. The solution is flexible and portable and can therefore be applied also outside the Hadoop environment. This new solution is an efficient and promising contribution to the monitoring community.

Software

MONITORING DISTRIBUTED SYSTEMS
PHILIP GRIESBACHER @ OSMC 2018

Page 2MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
1
WHO AM I? AND WHERE DO I COME FROM?

Page 3MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
ABOUT ME AND THIS TALK
 Past: 5years at ConSol‘s Monitoring Team
 Finished my Master of Science inthe field of Computer Science
- Specialization: Distributed Systems
- Master Thesis: Concepts for Monitoring Distributed Systems
 Now: BigData, Machine Learning, Artificial Intelligence Department at BMW as System Architect for Cloud Native Topics

Page 4MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
THE BMW GROUP IN NUMBERS (2017)
129.900
Employees worldwide
2.463.526
Sold cars worldwide

Page 5MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
BIGDATA, MACHINE LEARNING, ARTIFICIAL INTELLIGENCE @ BMW GROUP
 Storage:
- ~ 3 PB Hadoop
- ~ 2 PB Streaming and other
 Memory:
- ~ 25TB Hadoop
- ~ 25TB Streaming and other
 ~ 1TB data growth per day

Page 6MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
2
INTRODUCTION

Page 7MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
INTRODUCTION
DISTRIBUTED SYSTEMS
“A distributed system is a collection of independent computers that appearsto its users as a single coherent system.“
-Tanenbaum & Steen (2006)

Page 8MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
INTRODUCTION
DISTRIBUTED SYSTEMS
Hadoop Ecosystem
 YARN
 HDFS
 MapReduce
 (Spark)

Page 9MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
INTRODUCTION
RESEARCH QUESTION
Howto monitor distributed systems?

Page 10MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
3
ANALYSIS OF MONITORING CONCEPTS AND
SYSTEMS

Page 11MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
ANALYSIS OF MONITORING CONCEPTS AND SYSTEMS
ANALYSIS OF MONITORING CONCEPTS
 Push vs. Pull
 Blackbox vs. Whitebox
 Agent-based vs. Agent-less

Page 12MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
ANALYSIS OF MONITORING CONCEPTS AND SYSTEMS
REQUIREMENTS ON MONITORING SYSTEMS
 Scalability
 Robustness
 Extensibility
 Manageability / Administratively Scalable
 Portability
 Overhead
 Security

Page 13MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
ANALYSIS OF MONITORING CONCEPTS AND SYSTEMS
CONCLUSION
Ganglia Nagios Prometheus
Pro - Manageability / Administratively
Scalable
- Extensibility
- Robustness
- Portability
- Overhead
Contra - Geographical Scaling
- Security
- Overhead - Manageability / Administratively
Scalable
→ Nagios + Prometheus

Page 14MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
4
MONITORING SOLUTION

Page 15MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
MONITORING SOLUTION
SHOWCASE ENVIRONMENT
4 (ResourceManagers and NameNodes)
+ 252 * 2 (DataNodes and NodeManagers)
+ 252 * 3 (ApplicationMaster, Map and Reduce)
= 1264 processes at minimum.

Page 16MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
MONITORING SOLUTION
CHALLENGE
“If a human operator needstotouch your system during normal operations, you have a bug.“
- Carla Geisser, Google

Page 17MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
MONITORING SOLUTION
APPLICATIONTYPES
Static Applications:
 ResourceManagers
 NameNodes
 DataNodes
 NodeManagers
Dynamic Applications:
 ApplicationMaster
 Map
 Reduce

Page 18MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
MONITORING SOLUTION
SERVICE DISCOVERY - BACKGROUND
Container1
Container2
Container3

Page 19MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
MONITORING SOLUTION
SERVICE DISCOVERY - BACKGROUND
Container1
Container2
Container3
Registry
1

Page 20MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
MONITORING SOLUTION
SERVICE DISCOVERY - BACKGROUND
Container1
Container2
Container3
Client
Registry
1
2

Page 21MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
MONITORING SOLUTION
SERVICE DISCOVERY - BACKGROUND
Container1
Container2
Container3
Client
Registry
1
2
3

Page 22MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
MONITORING SOLUTION
IDEA

Page 23MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
MONITORING SOLUTION
JOCOSE

Page 24MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
MONITORING SOLUTION
PCAP_EXPORTER
Network-traffic exporter
Filter:
 Source and Destination
- Port
- IP
 Protocol

Page 25MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
MONITORING SOLUTION
GRAFANA GRAPH PANEL

Page 26MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
MONITORING SOLUTION
GRAFANA NETWORK GRAPH PANEL

Page 27MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
5
EVALUATION AND CONCLUSION

Page 28MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
EVALUATION
UTILIZATION - RESOURCE MONETIZATION

Page 29MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
CONCLUSION
 Fully automated monitoring
 Flexible and portable concept
 Scales withthe system to monitor

Page 30MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
6
DEMO

Page 31MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC
7
Q & A

What's hot

How can South Africa move toward a 1.5 C pathway?NewClimate Institute

Trivadis TechEvent 2016 IoT Portal with PowerBI and SharePoint by Jens Berten...Trivadis

SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"BigData_Europe

Airline Reservations and Routing: A Graph Use CaseJason Plurad

A New Data Landscape: Delivering Open Data for the City of Glasgow - Steven R...Association for Geographic Information (AGI)

SUG BELUX #13 Sitecore 9 recap Sitecore Symposium 2017Mikael Vandeskelde

Kubernetes-Based PaaSanorqiu

Session 4.3 semantic annotation for enhancing collaborative ideationsemanticsconference

BuildingSMART Germany October 27 2017; Open BIM at SchipholAlexander Worp

SC7 Webinar 4 04/05/2017 SatCen Presentation "The Secure Societies Community ...BigData_Europe

Intelligent gadgetsNoureddine Madoui

2019 GDRR: Blockchain Data Analytics - Cryptocurrency and blockchain analysis...The Statistical and Applied Mathematical Sciences Institute

Open Data in Agriculture - AGH20013 Hands-on sessionCarlos V.

Kenneth Burnham (IES - UK)I4MS_eu

Distribution and geospatial analyticsKale Needham

Open Geodata in GermanyOpen Knowledge Foundation

Bloomberg Market Concept Certificate_Yexin WanYexin (Olivia) Wan

Luigi Selmi - The Big Data Integrator PlatformBigData_Europe

Make monitoring ready for cloud native applicationsRodrigue Chakode

What's hot (19)

How can South Africa move toward a 1.5 C pathway?

Trivadis TechEvent 2016 IoT Portal with PowerBI and SharePoint by Jens Berten...

SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"

Airline Reservations and Routing: A Graph Use Case

A New Data Landscape: Delivering Open Data for the City of Glasgow - Steven R...

SUG BELUX #13 Sitecore 9 recap Sitecore Symposium 2017

Kubernetes-Based PaaS

Session 4.3 semantic annotation for enhancing collaborative ideation

BuildingSMART Germany October 27 2017; Open BIM at Schiphol

SC7 Webinar 4 04/05/2017 SatCen Presentation "The Secure Societies Community ...

Intelligent gadgets

2019 GDRR: Blockchain Data Analytics - Cryptocurrency and blockchain analysis...

Open Data in Agriculture - AGH20013 Hands-on session

Kenneth Burnham (IES - UK)

Distribution and geospatial analytics

Open Geodata in Germany

Bloomberg Market Concept Certificate_Yexin Wan

Luigi Selmi - The Big Data Integrator Platform

Make monitoring ready for cloud native applications

Similar to OSMC 2018 | Monitoring Distributed Systems by Philip Griesbacher

Scalable Application Development @ PicnicSander Mak (@Sander_Mak)

Business Ecosystems Internet of Things at ABBMikko Marsio

Bristlecone Innovation by Sweeni Ponoth VP & GM, Bristlecone LabsBristlecone SCC

ICARUS @EBDVF 2018 - TransformingTransport Session (November 2018, Vienna)ICARUS2020.aero

Infrastructure predictive monitoring with itoa jean louis baudoin, capgemini-...Capgemini

IS-4011, Accelerating Analytics on HADOOP using OpenCL, by Zubin Dowlaty and ...AMD Developer Central

2nd PyData Piraeus meetup - Data Science Initiatives in Titan Cement CompanyPyData Piraeus

DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & AnalyticsDr. Arif Wider

What’s Next For AppDynamics and Cisco? AppD Global Tour LondonAppDynamics

Vodafone Internet of ThingsM2M Alliance e.V.

Miplm Case Study Smart FarmingMIPLM

CIR Conferences - Cambridge ConsultantsJustin Hayward

Miplm Case Study Smart ProductionMIPLM

Greenplum: Driving the future of Data Warehousing and Analyticseaiti

How Financial Services can Save On File Storage Charly Mostert

Efficiency analyzer for wallblower iiot_questUma Santharam

The digital shakeout in quality assurance and testing by Shiva Agolla and Sat...QA or the Highway

Data Science Development Lifecycle - Everyone Talks About it, Nobody Really K...Rising Media Ltd.

MIPLM research projekt ip and economic aspects of a predictive maintenance se...MIPLM

Similar to OSMC 2018 | Monitoring Distributed Systems by Philip Griesbacher (20)

Scalable Application Development @ Picnic

Business Ecosystems Internet of Things at ABB

Bristlecone Innovation by Sweeni Ponoth VP & GM, Bristlecone Labs

ICARUS @EBDVF 2018 - TransformingTransport Session (November 2018, Vienna)

Infrastructure predictive monitoring with itoa jean louis baudoin, capgemini-...

IS-4011, Accelerating Analytics on HADOOP using OpenCL, by Zubin Dowlaty and ...

2nd PyData Piraeus meetup - Data Science Initiatives in Titan Cement Company

DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics

What’s Next For AppDynamics and Cisco? AppD Global Tour London

Vodafone Internet of Things

Miplm Case Study Smart Farming

CIR Conferences - Cambridge Consultants

Miplm Case Study Smart Production

Greenplum: Driving the future of Data Warehousing and Analytics

How Financial Services can Save On File Storage

Efficiency analyzer for wallblower iiot_quest

The digital shakeout in quality assurance and testing by Shiva Agolla and Sat...

Data Science Development Lifecycle - Everyone Talks About it, Nobody Really K...

MIPLM research projekt ip and economic aspects of a predictive maintenance se...

Recently uploaded

Software Quality Assurance Interview QuestionsArshad QA

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

A Secure and Reliable Document Management System is Essential.docxComplianceQuest1

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc

Optimizing AI for immediate response in Smart CCTVshikhaohhpro

Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveCall Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions

TECUNIQUE: Success Stories: IT Service providermohitmore19

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda

Recently uploaded (20)

Software Quality Assurance Interview Questions

Unlocking the Future of AI Agents with Large Language Models

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

A Secure and Reliable Document Management System is Essential.docx

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

Diamond Application Development Crafting Solutions with Precision

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

How To Use Server-Side Rendering with Nuxt.js

Optimizing AI for immediate response in Smart CCTV

Hand gesture recognition PROJECT PPT.pptx

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...

TECUNIQUE: Success Stories: IT Service provider

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...

OSMC 2018 | Monitoring Distributed Systems by Philip Griesbacher

1. MONITORING DISTRIBUTED SYSTEMS PHILIP GRIESBACHER @ OSMC 2018

2. Page 2MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC 1 WHO AM I? AND WHERE DO I COME FROM?

3. Page 3MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC ABOUT ME AND THIS TALK  Past: 5years at ConSol‘s Monitoring Team  Finished my Master of Science inthe field of Computer Science - Specialization: Distributed Systems - Master Thesis: Concepts for Monitoring Distributed Systems  Now: BigData, Machine Learning, Artificial Intelligence Department at BMW as System Architect for Cloud Native Topics

4. Page 4MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC THE BMW GROUP IN NUMBERS (2017) 129.900 Employees worldwide 2.463.526 Sold cars worldwide

5. Page 5MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC BIGDATA, MACHINE LEARNING, ARTIFICIAL INTELLIGENCE @ BMW GROUP  Storage: - ~ 3 PB Hadoop - ~ 2 PB Streaming and other  Memory: - ~ 25TB Hadoop - ~ 25TB Streaming and other  ~ 1TB data growth per day

6. Page 6MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC 2 INTRODUCTION

7. Page 7MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC INTRODUCTION DISTRIBUTED SYSTEMS “A distributed system is a collection of independent computers that appearsto its users as a single coherent system.“ -Tanenbaum & Steen (2006)

8. Page 8MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC INTRODUCTION DISTRIBUTED SYSTEMS Hadoop Ecosystem  YARN  HDFS  MapReduce  (Spark)

9. Page 9MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC INTRODUCTION RESEARCH QUESTION Howto monitor distributed systems?

10. Page 10MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC 3 ANALYSIS OF MONITORING CONCEPTS AND SYSTEMS

11. Page 11MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC ANALYSIS OF MONITORING CONCEPTS AND SYSTEMS ANALYSIS OF MONITORING CONCEPTS  Push vs. Pull  Blackbox vs. Whitebox  Agent-based vs. Agent-less

12. Page 12MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC ANALYSIS OF MONITORING CONCEPTS AND SYSTEMS REQUIREMENTS ON MONITORING SYSTEMS  Scalability  Robustness  Extensibility  Manageability / Administratively Scalable  Portability  Overhead  Security

13. Page 13MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC ANALYSIS OF MONITORING CONCEPTS AND SYSTEMS CONCLUSION Ganglia Nagios Prometheus Pro - Manageability / Administratively Scalable - Extensibility - Robustness - Portability - Overhead Contra - Geographical Scaling - Security - Overhead - Manageability / Administratively Scalable → Nagios + Prometheus

14. Page 14MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC 4 MONITORING SOLUTION

15. Page 15MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION SHOWCASE ENVIRONMENT 4 (ResourceManagers and NameNodes) + 252 * 2 (DataNodes and NodeManagers) + 252 * 3 (ApplicationMaster, Map and Reduce) = 1264 processes at minimum.

16. Page 16MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION CHALLENGE “If a human operator needstotouch your system during normal operations, you have a bug.“ - Carla Geisser, Google

17. Page 17MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION APPLICATIONTYPES Static Applications:  ResourceManagers  NameNodes  DataNodes  NodeManagers Dynamic Applications:  ApplicationMaster  Map  Reduce

18. Page 18MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION SERVICE DISCOVERY - BACKGROUND Container1 Container2 Container3

19. Page 19MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION SERVICE DISCOVERY - BACKGROUND Container1 Container2 Container3 Registry 1

20. Page 20MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION SERVICE DISCOVERY - BACKGROUND Container1 Container2 Container3 Client Registry 1 2

21. Page 21MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION SERVICE DISCOVERY - BACKGROUND Container1 Container2 Container3 Client Registry 1 2 3

22. Page 22MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION IDEA

23. Page 23MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION JOCOSE

24. Page 24MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION PCAP_EXPORTER Network-traffic exporter Filter:  Source and Destination - Port - IP  Protocol

25. Page 25MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION GRAFANA GRAPH PANEL

26. Page 26MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC MONITORING SOLUTION GRAFANA NETWORK GRAPH PANEL

27. Page 27MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC 5 EVALUATION AND CONCLUSION

28. Page 28MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC EVALUATION UTILIZATION - RESOURCE MONETIZATION

29. Page 29MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC CONCLUSION  Fully automated monitoring  Flexible and portable concept  Scales withthe system to monitor

30. Page 30MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC 6 DEMO

31. Page 31MonitoringDistributed Systems | Philip Griesbacher | 07.11.2018 @ OSMC 7 Q & A

OSMC 2018 | Monitoring Distributed Systems by Philip Griesbacher

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to OSMC 2018 | Monitoring Distributed Systems by Philip Griesbacher

Similar to OSMC 2018 | Monitoring Distributed Systems by Philip Griesbacher (20)

Recently uploaded

Recently uploaded (20)

OSMC 2018 | Monitoring Distributed Systems by Philip Griesbacher