Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Kurt Schneider [Discover Financial] | How Discover Modernizes Observability with InfluxDB Cloud | InfluxDays Virtual Experience NA 2020

Download to read offline

Kurt Schneider [Discover Financial] | How Discover Modernizes Observability with InfluxDB Cloud | InfluxDays Virtual Experience NA 2020

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Kurt Schneider [Discover Financial] | How Discover Modernizes Observability with InfluxDB Cloud | InfluxDays Virtual Experience NA 2020

  1. 1. Discover Financial Technology Kurt Schneider InfluxData Journey
  2. 2. Kurt Schneider Domain Architect – Observability The selection process and roll out of InfluxDB (Telegraf) at Discover Financial. InfluxData Journey 30+ years in Monitoring. 27 Years at Large Insurance Company 5 Years at Discover Financial
  3. 3. Discover Financial - Timeline • Jan 2019 We made a decision to replace CA-UIM with potential other toolset. • CA UIM was our infrastructure monitoring tool • Agent based on prem solution (old Nimsoft) • We looked at SignalFx, IBM, Datadog, InfluxData and new UIM solution. • We POC’d installed Datadog and InfluxDB. • We chose InfluxData as our partner and signed contract 2019 Dec.
  4. 4. RFP Process – Reasons for Selection • Price • Technology • Datadog Gaps were the same • No call back process • Similar Tools – both written in GO (agent) • InfluxDB data retrieval was fastest • Datadog GUI and Alerting more refined • We were not looking for APM or Synthetics (DD has many other capabilities we were not looking for).
  5. 5. Synthetic Monitoring Web, API, DNS, SSL, HTML, TCP, NTP On Prem and Cloud Testing Infrastructure Monitoring Windows, AIX, Unix, Logs, Network Application Performance Management and Real User Monitoring Event Management Machine Learning / AIOPS Event Portal Strategy Shared API (2020) API Communication API Communication API Communication API Communication Discover 2020-2021 Migration Plan 3/30/21 In Progress Live 10/1/19 700+ Daily Users 18K JVM installed OCP
  6. 6. • SAAS Solution • InfluxDB - Time Series Database designed from ground up to collect time series data like server metrics.  Provable Fastest database for these types of metrics. • Hundreds of extensions • DevOps tool easy to access metrics A time series database is a software system that is optimized for storing and serving time series through associated pairs of time and value. In some fields, time series may be called profiles, curves, traces or trends. InfluxData replacing CA-UIM and Ganglia
  7. 7. Users Chronograf UI Cloud InfluxDB Cloud Kapacitor Alert Engine AWS Hosted InfluxData vclp006888.na.disc overfinancial.com vclp007382.rw.disc overfinancial.com vclp006889.na.disc overfinancial.com vclp007381.rw.disc overfinancial.com Influx Gateway Telegraf Forwarders InfluxData Architecture Discover Financial TCP Port 48048 Enterprise Servers Telegraf Agents Windows, Linux and AIX 8000+ TCP Port 9092 RW NA OKTA OKTA
  8. 8. InfluxData Hundreds of Input and Output Plugins Available
  9. 9. Telegraf Current Collection Progress Using Today Coming Soon – Actively Investigating Future
  10. 10. Enterprise Server Metrics Over 100 Metrics out of the box Sample Collected: Linux (8000+), AIX(300+) and Windows (1200+) VMWARE (3 vSphere) Citrix Active Directory Over 10,000 Telegraf Agents are Reporting to our Cloud Instance/ Can Alert on any combination of these Metrics. Will be offering some self-service capabilities and interested party notification capabilities
  11. 11. Moogsoft Integration
  12. 12. Chronograf Usage
  13. 13. Chronograf UI – Time Series in Real Time
  14. 14. Chronograf Forwarder Health
  15. 15. Telegraf Average Usage across Enterprise
  16. 16. Kapacitor
  17. 17. Grafana Usage
  18. 18. Grafana - Work from Home Dashboard
  19. 19. Grafana / InfluxDB Linux Dashboards Hundreds of Community Dashboards Available
  20. 20. vSphere Visualization with Telegraf Data
  21. 21. Level 1 Level 2 Level 3 Deployment • Out of the box configuration • Agents deployed • Top 10 KPI monitored (CPU, Disk, Processes, etc.) Standard • Integrated with event manager • Metrics, events, logs, and traces • Real-time visualization • Application process monitors • Infrastructure tied to applications (CMDB) Preventative and Self Healing • Automatically kick off orchestration • Baseline monitoring and anomaly detection Maturity Model for Infrastructure and Platforms • Learn the tool • Work from tickets and calls • Use tool to root cause • Tune tool • Ongoing maintenance • Adding new StateBehaviors
  22. 22. Tips and Tricks • Forwarder Configuration important (We have four for 10K Telegraf Agents) • metric_batch_size = 30000 metric_buffer_limit = 500000 collection_jitter = "0s" flush_interval = "1s" flush_jitter = "0s“ • Batching is important when you have a lot of data ingestion. • Don't duplicate configurations anywhere • the forwarders need to run a different local configuration as well; otherwise they risk forwarding data that is being filtered. • Telegraf • Choose Metrics that matter (Telegraf) • Thousand of metrics in attribute groups – collect the valuable metrics.
  23. 23. Questions

Kurt Schneider [Discover Financial] | How Discover Modernizes Observability with InfluxDB Cloud | InfluxDays Virtual Experience NA 2020

Views

Total views

163

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

2

Shares

0

Comments

0

Likes

0

×