SlideShare a Scribd company logo
1 of 35
Open source tools for optimizing
your peering infrastructure
@ DE-CIX TechMeeting 2018-06-06
by Daniel Czerwonk
• Software / Network Engineer at Mauve Mailorder Software
• Head of Network Freifunk Essen e.V.
• AS44821 (Mauve), AS206356 (Freifunk Essen e.V.),
AS202739 (routing-rocks)
• birdwatcher and bio-routing contributor
• Twitter: @dan_nrw
• Github: https://github.com/czerwonk
• LinkedIn: https://www.linkedin.com/in/czerwonk/
Who is this guy? About me…
Our journey starts late 2016
A new networking setup is about to
be build
But before that:
Let’s talk about monitoring…
• Very small operations team
• Freifunk Essen should be even less ops demanding
• Identify trends/anomalies early
• Capacity planing (beware of retention)
• Source for alerting
• Start point for traffic engineering, etc.
• Source to build post mortem on (in case of outage)
• Dashboard to give a quick overview when needed
Why is monitoring important for me?
So, let’s build a monitoring system…
• Prometheus to collect metrics
• Grafana to visualize metrics
• Alertmanager with Pushover integration for alerting
• Everything Ansible managed
What I wanted…
+ +
• Bird routing daemon
• JunOS running on a few EX series switches
• Host metrics from bare metal software router machines (statistics, resources)
• External network latencies (RIPE ATLAS, etc.)
What I wanted to scrape?
What I found…
In 2016…
Metric Solution Problem
bird no exporter available
JunOS snmp_exporter
complex configuration,
bad performance
Host metrics node_exporter
Network latencies
blackbox_exporter with
external probe VMs
bad coverage,
only one request per scrape
• Official Prometheus project
• On Linux hosts (e.g. Routers)
• Network interface metrics
• Resource consumption: CPU load, RAM usage, Disk space
• Interrupts / context switches
• License: Apache 2.0
• Source: https://github.com/prometheus/node_exporter
node_exporter
At least we got the host metrics covered.
And the rest?
I had to solve that…
So I started to write some
exporters…
• Performance is key feature
• Need for concurrent processing
• Single binary / no dependencies
• Easy installation via go get …
• Existing client API for Prometheus
• Love writing code in golang in my spare time
Which programming language?
I chose golang:
atlas_exporter
RIPE ATLAS
Milestones to an exporter suite
bird_exporter
Bird 1.x
2016 20182017
RIPE LABS
article
Support for
bird 2.x
Replaced SNMP
by SSH
junos_exporter
Juniper JunOS
using SNMP
ping_exporter
ICMP probing
mikrotik-exporter
RouterOS
• Started late 2016
• Communicates with bird via socket
• Bird 1.x and 2.x supported
• Protocols: BGP, OSPFv2, OSPFv3, Kernel, Static, Device, Direct
• License: MIT
• Source: https://github.com/czerwonk/bird_exporter
bird_exporter
bird_exporter
bird_protocol_prefix_import_count{proto=~"BGP|OSPFv3",ip_version="6"}
count(bird_protocol_up{proto=“BGP"} == 1)
• BGP session state metrics
• BGP message counts (received, sent, withdrawn, etc.)
• Prefix counts for all supported protocols (imported, exported, filtered, etc.)
• OSPFv2/OSPFv3 neighbour counts
• Protocol uptime
bird_exporter - Features
• Started early 2018
• Replacement for RRD based smokeping
• Concerning ICMP also replacement for blackbox_exporter since lack of loss
detection
• Based on go-ping by Digineo: https://github.com/digineo/go-ping
• License: MIT
• Source: https://github.com/czerwonk/ping_exporter
ping_exporter
ping_exporter
ping_rtt_mean_ms{ip_version="6"}
ping_loss_percent{ip_version="4"}
• Sends and aggregates multiple ICMP ECHO requests
• Roundtrip metrics (current, best, worst)
• Simple way to detect loss
• Supports multiple targets
• DNS refresh ensures the correct IP is measured when DNS is changed
• Only ICMP support at the moment
• Warning: ICMP is not user traffic so keep that in mind when trying to interpret these
metrics
ping_exporter - Features
• Started early 2017
• Metrics by requesting measurement results from RIPE ATLAS
• Useful to get an outside view from different other networks
• License: LGPL3 (since the binding used is under this license)
• Source: https://github.com/czerwonk/atlas_exporter
• More info:
https://labs.ripe.net/Members/daniel_czerwonk/using-ripe-atlas-measurement-
results-in-prometheus-with-atlas_exporter
atlas_exporter
atlas_exporter
avg(atlas_ping_avg_latency{ip_version="4"}) by (asn)
avg(atlas_traceroute_hops{ip_version="4"}) by (asn)
• Ping (success, min/max/avg latency, dups, size)
• Traceroute (success, hop count, rtt)
• NTP (delay, derivation, ntp version)
• DNS (succress, rtt)
• HTTP (return code, rtt, http version, header size, body size)
• SSL Certificates (alert, rtt)
atlas_exporter - Features
• Started late 2017
• snmp_exporter did not perform as required
• First implementation using a simple set of SNMP OIDs
• Early 2018: reimplementation using SSH and XML RPC representation
• Alternative to Junipers OpenNTI since telemetry is only supported on newer
versions of JunOS and hardware
• License: MIT
• Source: https://github.com/czerwonk/junos_exporter
junos_exporter
• Interfaces (bytes transmitted/received, errors, drops)
• Routes (per table, by protocol)
• Alarms (count)
• BGP (message count, prefix counts per peer, session state)
• OSPFv2, OSPFv3 (number of neighbours)
• Interface diagnostics (optical signals)
• ISIS (number of adjacencies, total number of routers)
• Environment (temperatures)
• Routing engine statistics
junos_exporter - Features
• Contribution to existing project
• Only interface and resource metrics at this point
• Added several other features
• License: BSD3
• Source: https://github.com/nshttpd/mikrotik-exporter
mikrotik-exporter
• Interface metrics (RX bytes, TX bytes, drops, errors, etc.)
• BGP session states
• BGP message counts (updates, withdraws)
• DHCP leases
• DHCPv6 bindings
• Optical diagnostics
• IPv4/IPv6 pool counts
• System resources (memory, CPU load, etc.)
• Prefix counts per protocol (in RIB)
mikrotik-exporter - Features
Dashboard examples
How to combine several exporters?
Mauve Network Overview
Mauve Routing
Alerting
When and how?
How to alert?
What the SRE book has taught us:
https://landing.google.com/sre/book/chapters/monitoring-distributed-systems.html
How to alert? A few examples…
Port saturation:
Upstream session down:
Thank you for your attention.
Special thanks to all people contributed to my projects!

More Related Content

What's hot

OSINT RF Reverse Engineering by Marc Newlin
OSINT RF Reverse Engineering by Marc NewlinOSINT RF Reverse Engineering by Marc Newlin
OSINT RF Reverse Engineering by Marc Newlin
EC-Council
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTing
J On The Beach
 

What's hot (20)

Flink. Pure Streaming
Flink. Pure StreamingFlink. Pure Streaming
Flink. Pure Streaming
 
Juggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary dataJuggling with Bits and Bytes - How Apache Flink operates on binary data
Juggling with Bits and Bytes - How Apache Flink operates on binary data
 
Eac integrations JS LiveStream
Eac integrations JS LiveStreamEac integrations JS LiveStream
Eac integrations JS LiveStream
 
OSINT RF Reverse Engineering by Marc Newlin
OSINT RF Reverse Engineering by Marc NewlinOSINT RF Reverse Engineering by Marc Newlin
OSINT RF Reverse Engineering by Marc Newlin
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTing
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and vision
 
Monitoring with Prometheus
Monitoring with Prometheus Monitoring with Prometheus
Monitoring with Prometheus
 
Summit 16: StorPerf: Cinder Storage Performance Measurement
Summit 16: StorPerf: Cinder Storage Performance MeasurementSummit 16: StorPerf: Cinder Storage Performance Measurement
Summit 16: StorPerf: Cinder Storage Performance Measurement
 
OSDC 2018 - Distributed monitoring
OSDC 2018 - Distributed monitoringOSDC 2018 - Distributed monitoring
OSDC 2018 - Distributed monitoring
 
Raptor codes
Raptor codesRaptor codes
Raptor codes
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
 
A Science Project: Swift Serial Chat
A Science Project: Swift Serial ChatA Science Project: Swift Serial Chat
A Science Project: Swift Serial Chat
 
Training – Going Async
Training – Going AsyncTraining – Going Async
Training – Going Async
 
SecureWV - APT2
SecureWV - APT2SecureWV - APT2
SecureWV - APT2
 
Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...
Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...
Maksim Vazhenin [Dell Technologies] | InfluxDB for Storage System Monitoring ...
 
DerbyCon - APT2
DerbyCon - APT2DerbyCon - APT2
DerbyCon - APT2
 
Upstream Testing Collaboration
Upstream Testing Collaboration Upstream Testing Collaboration
Upstream Testing Collaboration
 
My Journey with Laravel by Shavkat, Ecompile.io
My Journey with Laravel by Shavkat, Ecompile.ioMy Journey with Laravel by Shavkat, Ecompile.io
My Journey with Laravel by Shavkat, Ecompile.io
 

Similar to Open source tools for optimizing your peering infrastructure @ DE-CIX TechMeeting 2018

10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
Mullaiselvan Mohan
 
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
Timothy Spann
 

Similar to Open source tools for optimizing your peering infrastructure @ DE-CIX TechMeeting 2018 (20)

Fluentd at HKOScon
Fluentd at HKOSconFluentd at HKOScon
Fluentd at HKOScon
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTW
 
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020
 
Fuzzing Janus @ IPTComm 2019
Fuzzing Janus @ IPTComm 2019Fuzzing Janus @ IPTComm 2019
Fuzzing Janus @ IPTComm 2019
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
 
from ai.backend import python @ pycontw2018
from ai.backend import python @ pycontw2018from ai.backend import python @ pycontw2018
from ai.backend import python @ pycontw2018
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
Network Situational Awareness with d00gle
Network Situational Awareness with d00gleNetwork Situational Awareness with d00gle
Network Situational Awareness with d00gle
 
Python on exadata
Python on exadataPython on exadata
Python on exadata
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
Hail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open source
 
Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)
 
Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)
Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)
Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
 
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
 
Howto createOpenFlow Switchusing FPGA (at FPGAX#6)
Howto createOpenFlow Switchusing FPGA (at FPGAX#6)Howto createOpenFlow Switchusing FPGA (at FPGAX#6)
Howto createOpenFlow Switchusing FPGA (at FPGAX#6)
 
Apache edgent
Apache edgentApache edgent
Apache edgent
 
Splunk: Forward me the REST of those shells
Splunk: Forward me the REST of those shellsSplunk: Forward me the REST of those shells
Splunk: Forward me the REST of those shells
 
Fuzzing RTC @ Kamailio World 2019
Fuzzing RTC @ Kamailio World 2019Fuzzing RTC @ Kamailio World 2019
Fuzzing RTC @ Kamailio World 2019
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Recently uploaded (20)

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 

Open source tools for optimizing your peering infrastructure @ DE-CIX TechMeeting 2018

  • 1. Open source tools for optimizing your peering infrastructure @ DE-CIX TechMeeting 2018-06-06 by Daniel Czerwonk
  • 2. • Software / Network Engineer at Mauve Mailorder Software • Head of Network Freifunk Essen e.V. • AS44821 (Mauve), AS206356 (Freifunk Essen e.V.), AS202739 (routing-rocks) • birdwatcher and bio-routing contributor • Twitter: @dan_nrw • Github: https://github.com/czerwonk • LinkedIn: https://www.linkedin.com/in/czerwonk/ Who is this guy? About me…
  • 3. Our journey starts late 2016 A new networking setup is about to be build
  • 4. But before that: Let’s talk about monitoring…
  • 5. • Very small operations team • Freifunk Essen should be even less ops demanding • Identify trends/anomalies early • Capacity planing (beware of retention) • Source for alerting • Start point for traffic engineering, etc. • Source to build post mortem on (in case of outage) • Dashboard to give a quick overview when needed Why is monitoring important for me?
  • 6. So, let’s build a monitoring system…
  • 7. • Prometheus to collect metrics • Grafana to visualize metrics • Alertmanager with Pushover integration for alerting • Everything Ansible managed What I wanted… + +
  • 8. • Bird routing daemon • JunOS running on a few EX series switches • Host metrics from bare metal software router machines (statistics, resources) • External network latencies (RIPE ATLAS, etc.) What I wanted to scrape?
  • 10. In 2016… Metric Solution Problem bird no exporter available JunOS snmp_exporter complex configuration, bad performance Host metrics node_exporter Network latencies blackbox_exporter with external probe VMs bad coverage, only one request per scrape
  • 11. • Official Prometheus project • On Linux hosts (e.g. Routers) • Network interface metrics • Resource consumption: CPU load, RAM usage, Disk space • Interrupts / context switches • License: Apache 2.0 • Source: https://github.com/prometheus/node_exporter node_exporter
  • 12. At least we got the host metrics covered. And the rest? I had to solve that…
  • 13. So I started to write some exporters…
  • 14. • Performance is key feature • Need for concurrent processing • Single binary / no dependencies • Easy installation via go get … • Existing client API for Prometheus • Love writing code in golang in my spare time Which programming language? I chose golang:
  • 15. atlas_exporter RIPE ATLAS Milestones to an exporter suite bird_exporter Bird 1.x 2016 20182017 RIPE LABS article Support for bird 2.x Replaced SNMP by SSH junos_exporter Juniper JunOS using SNMP ping_exporter ICMP probing mikrotik-exporter RouterOS
  • 16. • Started late 2016 • Communicates with bird via socket • Bird 1.x and 2.x supported • Protocols: BGP, OSPFv2, OSPFv3, Kernel, Static, Device, Direct • License: MIT • Source: https://github.com/czerwonk/bird_exporter bird_exporter
  • 18. • BGP session state metrics • BGP message counts (received, sent, withdrawn, etc.) • Prefix counts for all supported protocols (imported, exported, filtered, etc.) • OSPFv2/OSPFv3 neighbour counts • Protocol uptime bird_exporter - Features
  • 19. • Started early 2018 • Replacement for RRD based smokeping • Concerning ICMP also replacement for blackbox_exporter since lack of loss detection • Based on go-ping by Digineo: https://github.com/digineo/go-ping • License: MIT • Source: https://github.com/czerwonk/ping_exporter ping_exporter
  • 21. • Sends and aggregates multiple ICMP ECHO requests • Roundtrip metrics (current, best, worst) • Simple way to detect loss • Supports multiple targets • DNS refresh ensures the correct IP is measured when DNS is changed • Only ICMP support at the moment • Warning: ICMP is not user traffic so keep that in mind when trying to interpret these metrics ping_exporter - Features
  • 22. • Started early 2017 • Metrics by requesting measurement results from RIPE ATLAS • Useful to get an outside view from different other networks • License: LGPL3 (since the binding used is under this license) • Source: https://github.com/czerwonk/atlas_exporter • More info: https://labs.ripe.net/Members/daniel_czerwonk/using-ripe-atlas-measurement- results-in-prometheus-with-atlas_exporter atlas_exporter
  • 24. • Ping (success, min/max/avg latency, dups, size) • Traceroute (success, hop count, rtt) • NTP (delay, derivation, ntp version) • DNS (succress, rtt) • HTTP (return code, rtt, http version, header size, body size) • SSL Certificates (alert, rtt) atlas_exporter - Features
  • 25. • Started late 2017 • snmp_exporter did not perform as required • First implementation using a simple set of SNMP OIDs • Early 2018: reimplementation using SSH and XML RPC representation • Alternative to Junipers OpenNTI since telemetry is only supported on newer versions of JunOS and hardware • License: MIT • Source: https://github.com/czerwonk/junos_exporter junos_exporter
  • 26. • Interfaces (bytes transmitted/received, errors, drops) • Routes (per table, by protocol) • Alarms (count) • BGP (message count, prefix counts per peer, session state) • OSPFv2, OSPFv3 (number of neighbours) • Interface diagnostics (optical signals) • ISIS (number of adjacencies, total number of routers) • Environment (temperatures) • Routing engine statistics junos_exporter - Features
  • 27. • Contribution to existing project • Only interface and resource metrics at this point • Added several other features • License: BSD3 • Source: https://github.com/nshttpd/mikrotik-exporter mikrotik-exporter
  • 28. • Interface metrics (RX bytes, TX bytes, drops, errors, etc.) • BGP session states • BGP message counts (updates, withdraws) • DHCP leases • DHCPv6 bindings • Optical diagnostics • IPv4/IPv6 pool counts • System resources (memory, CPU load, etc.) • Prefix counts per protocol (in RIB) mikrotik-exporter - Features
  • 29. Dashboard examples How to combine several exporters?
  • 33. How to alert? What the SRE book has taught us: https://landing.google.com/sre/book/chapters/monitoring-distributed-systems.html
  • 34. How to alert? A few examples… Port saturation: Upstream session down:
  • 35. Thank you for your attention. Special thanks to all people contributed to my projects!