This talk will focus on the monitoring of the Schwarz Group, one of the largest retailers in the world, with an extremely large and complex environment, using the OMD platform. OMD is an all-in-one monitoring solution that integrates Naemon/Thruk for host and service monitoring, Prometheus for metrics collection, and Grafana for data visualization. The monitoring extends from classic infrastructure, such as servers and networks, to cloud-based applications and even to the business operations in stores. This comprehensive monitoring approach provides necessary information and data for each client group, allowing the Schwarz Group to optimize its operations and enhance customer experience. We will discuss the benefits of using OMD and its tools in such a large environment, including its scalability, flexibility, and ease of use. We will also cover some of the challenges faced during the lifecycle of the monitoring solution. Overall, this talk will provide valuable insights into the power of OMD for monitoring large-scale and complex infrastructures, and how it can help organizations like the Schwarz Group improve their operations, maintain their systems in top condition, and enhance their customer experience.
OSMC 2023 | Monitoring at one of the largest retail groups in the world by Matthias Gallinger & Tobias Kempf
1. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world 1
Monitoring at one of
the largest retail
groups in the world
Matthias Gallinger & Tobias Kempf
Nürnberg, 09 November 2023
2. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
Matthias Gallinger
Monitoring consultant
ConSol Software GmbH
Tobias Kempf
Solution Architect
Schwarz IT KG
Speakers
2
3. Schwarz IT
Schwarz IT
1. Introduction Consol and Schwarz group
Company informations
2. Monitoring Strategy Schwarz group
Overview and general idea
3. Monitoring of the IT–Landscape with OMD
Technical facts and architecture
4. End to End Website monitoring
S.P.A.C.E. Playwright
5. Internal marketing and customer contact
Sell the idea and build a community
6. Operating
Keep the infrastructure going
Agenda
3
| 09.11.2023 | Monitoring at one of the largest retail groups in the world
4. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world 4
01
Introduction Consol and Schwarz
5. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
§ 4 Countries
§ 5 Locations
§ 260 Employees ( 150 developer )
§ 20 Years of Monitoring Experience
Krakow / Poland
Vienna / Austria
San Francisco
Munich
Düsseldorf
6. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
6
Schwarz Group
• Over 575,000 employees
• Round 13,700 stores
• In 32 countries
• Round 6.8 billion store customers
• 154.1 billion euros sales in fiscal year 2022
7. Schwarz IT
Schwarz IT 7
| 09.11.2023 | Monitoring at one of the largest retail groups in the world
Schwarz Digits division
7.500
1.250
34
IT- and Digital Experts
IT- and Digital solutions
IT Units worldwide
8. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world 8
02
Monitoring Strategy Schwarz Group
9. Schwarz IT
Schwarz IT
Technology-Stack
“On-Prem Infrastructure Monitoring”
Technology-Stack
“Container Monitoring”
Technology-Stack
“STACKIT Monitoring”
Technology-Stack
“Observability”
Metrics Traces
1
Logs
2
Technology-Stack
“Cloud Infrastructure Monitoring”
Technical Monitoring
Technology-Stack
“API & Synthetic Monitoring”
Technology-Stack
“Business Process Monitoring”
Technology-Stack
“Cloud Applications Monitoring”
Functional Monitoring +
4+1
Framework
Monitoring Categories
Functional and Technical Monitoring
| 09.11.2023 | Monitoring at one of the largest retail groups in the world
Commercial
tool 2
Cloud
vendor
tool
Commercial
tool 2
9
10. Schwarz IT
Schwarz IT
Technology-Stack
“On-Prem Infrastructure Monitoring”
Technology-Stack
“Container Monitoring”
Technology-Stack
“STACKIT Monitoring”
Technology-Stack
“Observability”
Metrics Traces
1
Logs
2
Technology-Stack
“Cloud Infrastructure Monitoring”
Technical Monitoring
Technology-Stack
“API & Synthetic Monitoring”
Technology-Stack
“Business Process Monitoring”
Technology-Stack
“Cloud Applications Monitoring”
Functional Monitoring +
4+1
Framework
Monitoring Categories
Functional and Technical Monitoring
| 09.11.2023 | Monitoring at one of the largest retail groups in the world
Commercial
tool 2
Cloud
vendor
tool
Commercial
tool 2
Cloud
vendor
tool
§ We provide more and more Monitoring services à Monitoring Appliance
10
11. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world 11
03
Monitoring of the IT–Landscape
with OMD
12. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
Lookback to 2017
OSMC 2017
12
• What has changed since then?
• What has changed around us?
• Have we reached our goals?
• Store monitoring
• Scaling
• Establish community
• OSMC-Archiv:
https://t1p.de/OSMC-Archive-2017
13. Schwarz IT
Schwarz IT 13
KPIs of current OMD environment
| 09.11.2023 | Monitoring at one of the largest retail groups in the world
More than
782.502
Hosts
monitored by Naemon with 280
Backends
Availability YoY
99,703%
We check all our systems 24/7/365:
1.608.968.448
Checks / day
More than
5.778.628
Thruk Users/Grafana Users
14450/8250
Services
Increase of +40%
Service Checks YoY
14. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
KPIs
Service Checks History
14
2021 2022 2023
15. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
OMD framework
What‘s inside
15
• All tools, one package
• Preconfigured and fit together
• Shipped with plenty of Plugins and scripts
• Easy to automate
• No differences between OS Distributions
• The Sites concept ensures easy updates
16. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
• OMD package
• Include standard components
• Customer dependent add-on packages
• Provides licensed software
• Grafana Plugins
• Tools and Toys
customized for Schwarz
Package
16
https://t1p.de/omd-download
17. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
OMD framework
Site concept
17
18. Schwarz IT
Schwarz IT
Zones
Offices
Schwarz
Networks
HQ
Warehouse
Kaufland
Stores
| 09.11.2023 | Monitoring at one of the largest retail groups in the world
System architecture
Harmonized and standardized OMD instances for all parts of the group in all countries
18
WH001
Schwarz IT
WHxyz Stores
HQ
Central View & Management
Zones
Locations
1
53
226
Lidl
LMD + Promxy
19. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
• VM based
• Ansible managed deployment
• OS Setup
• Application Setup
• Feature set configured choosen from meta data
• Management node -> Naemon, Thruk, Prometheus
• Worker node -> mod-gearman
• Global OMD monitoring environment
Setup
19
20. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
• Central Thruk Cluster
• Livestatus Multi Daemon / LMD for caching
• Logcache for Reports and historical data
Scaling
Webui Thruk
20
21. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world 21
Scaling
Naemon and Gearmand
• Gearmand and mod_gearman for check executions
• Multiple monitoring probes for scaling
• Automatic setup for additional Worker probe
22. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
• One CMDB is unrealistic
• Let them bring the data to you
• Set guidelines and standards
• Create a monitoring source of truth
CMDB and Self Service portals
Lessons learned
22
23. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
Naemon configuration
Coshsh framework
23
• Central config management, generation and revisioning
• Top-Down distribution via git repositories
• Multiple datasources
• CMDBs
• IDMs
• Selfservice portal
• Automate everything but allow customization
24. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
Prometheus / Grafana
Scrape config
24
• Default Grafana dashboards
• Default scrape targets generated by coshsh
• exporter_exporter for servers
• Example on postgres VM server:
├── Exporter Exporter: <Servername>:5757
│ ├── Node Exporter: 127.0.0.1:5758
│ ├── Postgres Exporter: 127.0.0.1:5759
│ ├── Additional Exporters: 127.0.0.1:57xx
• Automatic discovery of exporters
25. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world 25
04
End to End Website monitoring
26. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
• Synthetic E2E website monitoring
• Based on Playwright Framework
• Monitoring as Code with CI/CD Pipeline
• Visual Studio Code extension
• Integrated in OMD
• Standard website monitoring -> Naemon
• Advanced “click-paths“ -> SPACE
S.P.A.C.E
Schwarz Playwright Automatic Check Execution
26
27. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
S.P.A.C.E
Architecture
27
28. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
S.P.A.C.E - Visualization Grafana
28
29. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
S.P.A.C.E - Visualization Thruk
29
30. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
S.P.A.C.E - Visualization Playwright Report
30
31. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
S.P.A.C.E - Visualization Playwright Trace
31
32. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world 32
05
Internal marketing and customer
contact
33. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world
• Different customers -> Different needs
• Technical skills vary wildly
• Enabling through self-service and flexibility
Customers
WHO are the customers?
33
34. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world 34
• Building Communities
• Offer and request contact people
• Work together and stay in contact
• Create dedicated communication channels
• Slack/MS Teams Channels
• Documentation pages
• Regular blog entries
Communication Channels
Communities
35. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world 35
Monitoring Expert Group
§ Interested users and experts in their fields
§ Different backgrounds connected through monitoring
Hackathon
§ Pick dedicated topics/issues
§ Work together
§ Share knowledge
Trainings
§ Regular trainings for tools and workflows
§ Different skill levels
§ Hands on and documentation for afterwards
Communication Channels
Work together
36. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world 36
06
Operating
37. Schwarz IT
Schwarz IT | 09.11.2023 | Monitoring at one of the largest retail groups in the world 37
• 16 internal team members
• 2 external partners
• Doings Monitoring and Observability Solutions team
• Offers all technical products as solutions
• Handles contracts and internal billing
• Approx. 800 support requests per month
• 24/7 support for all solutions
Operating
How complex is the operation?
38. Schwarz IT
Schwarz IT | xx.xx.202x | Titel der Präsentation 38
Matthias Gallinger
Monitoring consultant
ConSol Software GmbH
Tobias Kempf
Solution Architect
Monitoring and Observability Solutions
Schwarz IT KG