Successfully reported this slideshow.
Your SlideShare is downloading. ×

Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gain Network Visibility | InfluxDays NA 2021

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 22 Ad

Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gain Network Visibility | InfluxDays NA 2021

Download to read offline

Red Hat is the provider of enterprise open source solutions. Its portfolio of products includes hybrid cloud infrastructure, middleware, cloud-native apps and automation solutions. Its internal network supports all lines of business — including 60+ sites. Discover how Red Hat uses InfluxDB and Flux for better real-time monitoring of their networks to improve performance and to understand utilization better.

Red Hat is the provider of enterprise open source solutions. Its portfolio of products includes hybrid cloud infrastructure, middleware, cloud-native apps and automation solutions. Its internal network supports all lines of business — including 60+ sites. Discover how Red Hat uses InfluxDB and Flux for better real-time monitoring of their networks to improve performance and to understand utilization better.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gain Network Visibility | InfluxDays NA 2021 (20)

Advertisement

More from InfluxData (20)

Recently uploaded (20)

Advertisement

Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gain Network Visibility | InfluxDays NA 2021

  1. 1. How Red Hat Uses gNMI, Telegraf and InfluxDB to Gain Network Visibility Martin Moucka - Principal Network Engineer Red Hat
  2. 2. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved. Agenda • Introduction • Scope • Why InfluxDB? • Architecture • Visualizations • Flux
  3. 3. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved. Red Hat The world’s leading provider of open source enterprise IT solutions MORE THAN 90% of the FORTUNE 500 RED HAT use PRODUCTS & SOLUTIONS* ~13,815 EMPLOYEES 105+ OFFICES 40+ COUNTRIES THE FIRST $3 OPEN SOURCE COMPANY IN THE WORLD BILLION
  4. 4. © 2021  InfluxData Inc. All Rights Reserved. Martin Moucka Principal Network Engineer, Red Hat ● With company for more than 7 years ● Built a network automation around Ansible, utilizing single source of truth ● Started transition to modern monitoring connected to the network automation ● Tech lead of Network Automation & Tools team E-mail: mmoucka@redhat.com
  5. 5. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved. Network Monitoring Network monitoring provides insight to the network. It monitors the status of network devices (switches, routers, firewalls, etc..), network status/performance. It provides a graphical view of metrics (e.g. link utilization) and/or device status (e.g. up or down) together with alerting when something is out of service. Key Capabilities of Network Monitoring Performance metric visualizations. Monitoring of the network for performance issues, display information in a visual format (Dashboards) - understand your network performance at a glance. Network alerts. Alert on any problems that occur. Discovery of issues from monitored data, augment alert data with relevant information helping support teams to respond quickly. Network mapping. Visualization of complex network landscapes in a map format including device/network health state. Bandwidth monitoring. Identify where network bandwidth usage is not optimal, and drive decisions to improve utilization.
  6. 6. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved. Scope • Juniper, Cisco (WLC, ASA, IOS, UCS, etc...), OpenGear, F5 and Mist • Custom probes for synthetic monitoring • 60+ sites • ~ 1.6k monitored devices • ~ 14k monitored interfaces • 5 collectors
  7. 7. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved. Why InfluxDB? • Open Source with Enterprise support • Efficient data storage • Flexibility in integrations/languages • Modular agent Telegraf with support of JTI (Juniper Telemetry Int.) • Support for SQL-like query language • Flux as powerful flexible query language
  8. 8. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved. Solution Architecture Distributed Monitoring Services / Storage Network Devices Telegraf/Kapacitor/InfluxDB Troubleshooting Network Automation Adding/Removing device Event Management Visualization Probes Alert Check / Send data Manual intervention Event Automation Troubleshooting Fix Configure Configure New monitored system/device
  9. 9. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved.
  10. 10. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved. Visualizations - Immediate response • Device detailed status • Interface utilization (SNMP / gNMI) • Interface errors (SNMP / gNMI) • CPU/Memory utilization (SNMP) • BGP neighbors status (SNMP / gNMI in progress) • etc... • Site View • Data from probe (Latency, Packet loss, HTTP response time, DNS delay) • SLI/SLO status (Kapacitor processed + Flux query) • Internet link utilization (processed by Kapacitor) • Top talkers (from other tool via RestAPI) • Wireless status • Statistics of WLC/APs and connected clients
  11. 11. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved.
  12. 12. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved.
  13. 13. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved.
  14. 14. © 2021  InfluxData Inc. All Rights Reserved. 14 © 2021  InfluxData Inc. All Rights Reserved.
  15. 15. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved. Visualizations - Long-term planning • Link capacity utilization • Status page based on SLI/SLO • Wireless AP (Cisco WLC) anomaly detection - Flux • Compliance reporting
  16. 16. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved.
  17. 17. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved.
  18. 18. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved. Flux • Provides very flexible programmatic way of query • Allows changing data type within a query • Within compliance report, we connect up to 5 different measurements • Used for access point, poor SNR anomaly detection across regions • Focus where it matters most • Allows custom functions • Median Absolute Deviation used for anomaly detection • Well-documented at https://www.influxdata.com/blog/anomaly-detection-with-median-abs olute-deviation/
  19. 19. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved. Median Absolute Deviation - Function import "math" import "experimental" mad = (table=<-, threshold=3.0) => { data = table |> group(columns: ["_time"], mode:"by") med = data |> median(column: "_value") diff = join(tables: {data: data, med: med}, on: ["_time"], method: "inner") |> map(fn: (r) => ({ r with _value: math.abs(x: r._value_data - r._value_med) })) |> drop(columns: ["_start", "_stop", "_value_med", "_value_data"]) k = 1.4826 diff_med = diff |> median(column: "_value") |> map(fn: (r) => ({ r with MAD: k * r._value})) |> filter(fn: (r) => r.MAD > 0.0) output = join(tables: {diff: diff, diff_med: diff_med}, on: ["_time"], method: "inner") |> map(fn: (r) => ({ r with _value: r._value_diff/r._value_diff_med})) |> map(fn: (r) => ({ r with level: if r._value >= threshold then "anomaly" else "normal" })) return output }
  20. 20. © 2021  InfluxData Inc. All Rights Reserved. © 2021  InfluxData Inc. All Rights Reserved. Median Absolute Deviation - Usage pc_duration = from(bucket: "XXXXXX") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn: (r) => r._measurement == "bsnAPTable" and r._field =~ /radio1PoorSNRClients|radio1Users/ and r.region == "${region}" ) |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value") |> filter(fn: (r) => r.radio1PoorSNRClients > 0 and r.radio1Users > 0 ) |> map(fn: (r) => ({ r with CNPR: float(v: r.radio1PoorSNRClients) / float(v: r.radio1Users)})) |> stateDuration( fn: (r) => r.CNPR >= 0.1, column: "duration" ) |> map(fn: (r) => ({ r with _value: float(v: r.duration) / float(v: r.CNPR)})) |> filter(fn: (r) => r._value > 0) |> truncateTimeColumn(unit: 1h) |> toFloat() pc_duration |> mad(threshold:10.0) |> filter(fn: (r) => r.level == "anomaly") |> group(columns: ["APName"]) |> count() |> group()
  21. 21. © 2021  InfluxData Inc. All Rights Reserved. Questions?
  22. 22. © 2021  InfluxData Inc. All Rights Reserved. Thank You

×