The speech is devided in two parts:
The first part presents the way T-Mobile uses Nagios for monitoring GSM networks in the department for International Roaming. You will get a description of the international design which T-Mobile Deutschland has chosen within the T-Mobile Group in Europe and an illustration of the Nagios service checks which T-Mobile is using, focusing on the ones which were developed especially for the needs of GSM networks. Furthermore the way T-Mobile integrated Nagios for KPI measurement, alarming and performance measurements will be shown.
The second part focuses on the environment which T-Mobile built for running several central Nagios servers and distributed NRPEs in the T-Mobile national companies.
At the end of the presentation a brief live demonstration is planned.
XpertSolvers: Your Partner in Building Innovative Software Solutions
Nagios Monitors T-Mobile's Global GSM Networks
1. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 1
NETWAYS Nagios Conference 2008
Using Nagios for service monitoring in GSM-based T-Mobile networks
2. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 2F. Maerz / C. Hirsch
Using Nagios for service monitoring in GSM-based
networks at T-Mobile
Introducing Network, Service and Host Management forIntroducing Network, Service and Host Management forIntroducing Network, Service and Host Management forIntroducing Network, Service and Host Management for
TTTT----Mobile European Service Operation Centre International RoamingMobile European Service Operation Centre International RoamingMobile European Service Operation Centre International RoamingMobile European Service Operation Centre International Roaming
Frank März
frank.maerz@t-mobile.de
Christian Hirsch
christian.hirsch@t-mobile.de
3. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 3F. Maerz / C. Hirsch
T-Mobile ESOC IR
European Service Operation Centre for International Roaming
Started 1993 when International Roaming was introduced together with Italy
Today managing roaming services for
T-Mobile Deutschland
T-Mobile Austria
T-Mobile UK
T-Mobile Netherlands
T-Mobile Czech Rep.
and supporting T-Mobile national companies in Poland,
Slovakia, Croatia, USA, Hungary
Core team (17) based in Nuremberg, Germany
4. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 4F. Maerz / C. Hirsch
T-Mobile ESOC IR
Tasks
IREG (GSM Association IR Expert Group)IREG (GSM Association IR Expert Group)IREG (GSM Association IR Expert Group)IREG (GSM Association IR Expert Group)
Testing new roaming partners for any type of service
Voice roaming, prepaid roaming, data roaming, WLAN, MMS interworking
Network troubleshooting
Roaming EngineeringRoaming EngineeringRoaming EngineeringRoaming Engineering
Introducing new roaming and inter-working services
Active network testing
Network monitoring
Service Interface DeskService Interface DeskService Interface DeskService Interface Desk
Interface desk for roaming partner and carriers
Technical support for customer care
SIM Card management
Reporting
5. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 5F. Maerz / C. Hirsch
The most international Nagios implementation
T-Mobile uses 3 Nagios installations to monitor
205 countries in the world
530 foreign networks
every 5 minutes !
6. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 6F. Maerz / C. Hirsch
T-Mobile ESOC IR service monitoring philosophy
Layer 1 Connectivity (NAGIOS)
Between the IP core networks for all packet service roaming partners
Between all CS (voice) roaming partners
Towards all used equipment
Layer 2 Performance (partly NAGIOS)
Service confirmation
Performance data capturing
Layer 3 Verification
Performance data analysis
7. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 7F. Maerz / C. Hirsch
Layer 1 - Connectivity
These active tests are executed in short intervals, an outage caThese active tests are executed in short intervals, an outage caThese active tests are executed in short intervals, an outage caThese active tests are executed in short intervals, an outage can be recognized immediately.n be recognized immediately.n be recognized immediately.n be recognized immediately.
On controlled environment (10%):
by standard network management tools (e.g. PING)
On uncontrolled environment (90%):
by simulated “user” traffic (e.g. SMTP-Mail-From)
by simulated “control” traffic (e.g. GTP-Echo)
Ensuring connectivity for service availability:
““““Connectivity is the basis for every IT serviceConnectivity is the basis for every IT serviceConnectivity is the basis for every IT serviceConnectivity is the basis for every IT service””””
8. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 8F. Maerz / C. Hirsch
Layer 2 - Performance monitoring
Checking system function
Does the system provide the service it offers? (a DNS server response to a DNS request)
Requesting status information
Utilizes network management protocols to gather status information (load, temperature,
disk usage)
Using real user data traffic
Capture user traffic and check if it’s correct (protocol analyzer)
9. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 9F. Maerz / C. Hirsch
Layer 3 - Traffic Verification
Compare performance results over a period of time
Different values may indicate a load or bottleneck issue (e.g. compare Round Trip Time
values)
Look at complete call details for a single user
Filter for a single user connection in order to find problems on the bit level
Run statistic analysis on captured network traffic
Utilize captured user data for statistic analysis in order to measure success rates and
performance (e.g. Create PDP Context Reply Rate)
10. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 10F. Maerz / C. Hirsch
History of Nagios at T-Mobile ESOC IR
2003200320032003
T-Mobile ESOC IR started testing Nagios in with DNS and SNMP checks
2004200420042004
GTP (GPRS Tunnel Protocol) plugin for Nagios allowed us to simulate a GSM core node (SGSN)
2005200520052005
Support contract with Netways
Introduced Nagios Grapher
Including server monitoring
NRPE design / start of rollout to other T-Mobile networks
2006200620062006
Integrated gateway into SS7 network together with Telesoft Technologies (UK)
KPI performance monitoring reporting
2007200720072007
International rollout for SS7 gateways
2008200820082008
Nagios 3 on virtual XEN environment
11. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 11F. Maerz / C. Hirsch
Nagios – the perfect match for connectivity checks
Connectivity check
Retrieving network data
This requires a solution which is capable of making:
Connectivity check
Retrieve network data
Schedule these tasks
Present the results and forward performance data to other systems
Send alarms to external systems
Very powerful
Extremely flexible
It may be complex to manage and likely very expensive.
} Active checksActive checksActive checksActive checks
Not withNot withNot withNot with
12. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 12F. Maerz / C. Hirsch
NRPEs
Nagios T-Mobile for IP (Nagios Master)
NRPEs in local T-Mobile backbone networks
Nagios TMO IP
(nagios-master)
IP connectivity monitoring for GPRS / 3G
Checking MMS Inter-working (SMTP dialogs towards MMS Centers)
WLAN Roaming (Radius authentication)
Central Nagios Server with access to NRPEs in IP core networks in Germany, UK,
Netherlands, Austria
13. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 13F. Maerz / C. Hirsch
Nagios TMD
System Monitoring with Nagios
quite normal system health checks like:
hardware health
ping
load
ssh
disk space
services
…
performance / capacity monitoring:
router traffic
RTTs
route availability
…
14. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 14F. Maerz / C. Hirsch
Nagios T-Mobile SS7
Connectivity checks for voice roaming
Central Nagios Server triggers MAP dialogs on Telesoft Technologies application
server which runs NRPE
NAGIOS
SS7
The application server opens the MAP dialog in the local T-Mobile network
15. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 15F. Maerz / C. Hirsch
Summary of all used Nagios service checks for GSM
networks
Nagios checks “everything” every 5 minutes, over 250.000.000 checks a year
Connectivity check for GSM networks
Packet roaming – „GTP Echo“
MMS Interworking – SMTP Dialog“
CS Roaming – „MAP dialogs“
WLAN Roaming - Radius authentication
Performance
BGP routes to roaming partners
BGP peers status to neighbors
Interface status for physical links
Link usage
ftp/sftp connections
Serverload, user, temperature, disk usage, raid status, power supply, fans, zombie, processes
Running process
Log-In (ssh, telnet)
16. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 16F. Maerz / C. Hirsch
Technical Realization
Christian Hirsch
PART 2
17. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 17F. Maerz / C. Hirsch
Technical Realization
Special Plugin Design
18. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 18F. Maerz / C. Hirsch
GPRS / 3G Roaming network environment
MNO B
Peering
Exchange
GRX1 GRX2
GGSNCPE
Local
tail
T-Mobile
IP
Border
Gateway
BG
Nagios NRPE
DNS DNS
IP
network
IP
network
It uses the DNS protocol to resolve the APN (access point name) for IR partners
The DNS responds with return of the IP from the home GGSN for the roaming partner
The NRPE sends a GTP Echo towards the GGSN IP address
If the GGSN responds the connectivity is OK
D
N
S.req
DNS.req
DNS.res
D
N
S.res
GTP-Echo.req
GTP-Echo.res
RTT is displayed in Nagios Grapher, RTT indicates backbone bottlenecks
HowHowHowHow check_ggsncheck_ggsncheck_ggsncheck_ggsn worksworksworksworks::::
Nagios acts like a GSM network node (SGSN)
19. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 19F. Maerz / C. Hirsch
MNO B
SS7
Carrier
SS7
Carrier
MNO A
SS7 SS7
Voice roaming network environment
NAGIOS
SIGNALLING
GATEWAY
This allows Nagios to simulate GSM functions like register to a network, initial calls or SMS
The gateway was designed by T-Mobile and Telesoft Technologies
NAGIOS interacts with a SS7 gateway which “speaks” GSM MAP (3GPP 29.002)
20. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 20F. Maerz / C. Hirsch
Technical Realization
Nagios 3 on virtual XEN environment
21. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 21F. Maerz / C. Hirsch
Nagios 3 on virtual XEN environment
reduced hardware costs
High Availability
minimize downtimes during scheduled maintenance
easy backups
reduced power consumption and need for cooling (GREEN IT)(GREEN IT)(GREEN IT)(GREEN IT)
nagios-tmd
nagios-master
nagios-ss7
26. 11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 26F. Maerz / C. Hirsch
Any Questions?
““““NowNowNowNow itititit‘‘‘‘ssss timetimetimetime forforforfor aaaa
livelivelivelive demodemodemodemo…“…“…“…“