4 - Customer story: Telenet

•

0 likes•207 views

Telenet was looking for a centralisation of their logs to make them easier searchable and allow easier troubleshooting for their infrastructure. The've partnered up with Kangaroot for the design & implementation and enjoy Enterprise Support via Elastic. During this session, you'll find out how they've started up with this project.

Technology

§ Why Elastic?
§ Use cases:
– Elasticsearch for troubleshooting
– Elasticsearch for trending (metrics)
– Monitoring ELK stack
§ Setup
– Implementation diagram
– Details
– HW migration strategy
– Numbers
– Alerting
– Intelligent alerts
OUTLINE
2

– Central location of logs
– To allow easier troubleshooting of infrastructure/apps
– No need to login to different systems to check the logs
> keep logs longer then allowed by local diskspace on app servers
– Implementation
> Partnered with Kangaroot for design/implementation
– Using Ansible for deployment/upgrades
> Entreprise support via Elastic
WHY ELASTIC
3

§ Log analysis of F5 access logs
– Graphs/Alerts on average response times for web
apps
– Heavily used by Operations
§ VMware logs
– vCenter logs for auditing reasons (Oracle
licensing)
– when ESXi crashes you might lose your logs
§ Network & storage device logs
§ Kafka broker monitoring
– {metric,file}beat
§ Monitoring Elastic itself
– Logstash filebeat, Elastic nodes, Kibana nodes,
Elastic cluster health
§ Application logs for developers to allow easier
troubleshooting
– Weblogic, Tomcat, JBoss/WildFly, AEM, …
§ Generate alerts towards entreprise monitoring
solution using watches
§ Replacement of GSA with a custom API with
Elastic backend
USE CASES
4

5
§ ELK implementation diagram
Shipper
Shipper
Indexer
Indexer
Indexer

§ Logstash
– Shipper layer uses 1 pipeline
– Index layer uses multiple pipelines
> Grok filters for parsing logfiles, need some logging standards
Alternative
> Use native json logging format
– Monitoring via x-pack
> destination: Elastic monitoring cluster
§ Kafka
– Monitoring using filebeat/metricbeat
> destination: Elastic cluster, bypassing Logstash/Kafka
§ Kibana
– Using coordination-only node
– Loadbalance queries across Elastic nodes
DETAILS
6

§ Setup new independent cluster on new HW (master nodes, data nodes, kibana)
§ Setup new logstash indexer layer using a unique group_id (different kafka consumer_id)
§ Migrate index patterns, existing roles, index templates, visualizations & dashboards, watches
§ data sources need no modification
§ Data is ingested to both clusters
– Allows for testing new Hardware without impact on current cluster
– data migration of older data if needed using snapshot/restore
– Minimal to no data migration by running in parallel for time of data retention
– Once done => switch Kibana VIP from old Kibana to new Kibana instance
HW MIGRATION STRATEGY
7

§ PRD cluster
– 7 physical warm datanodes
– 3 physical hot datanodes
– 3 dedicated virtual master nodes
§ Currently running version 6.5
§ Retention:
– 30-days of data for infrastructure related logs
– 3 weeks of data for application logs
– Few months for metrics
§ Current replicated datavolume: 32TB
§ Roughly 850 GB/day incoming logs
§ 7000 events/s for F5 access logs => daily replicated volume: 500 to 600 Gb/day
§ 3200 events/s for VMware logs => daily replicated volume: 350 Gb/day
§ 500 events/s for Metricbeat => monthly replicated volume: 400 Gb
NUMBERS
8

§ WATCHER
– Input
> Search (Elastic query)
> Http request
– Trigger
> Time based: when to execute watcher (e.g. every 5min)
– Condition
> When to execute action against
– Action to take if condition is met
> log message to file
> send e-mail
> notification to Chat tool (e.g. Slack)
> Call to Webhook
ALERTING
9

§ Alerts are typically static
– E.g. cpu usage should be below 90%, response times should be below 0.5s
– Not aware of periodicity, e.g. billing cycle, weekends, …
§ Enter machine learning (ML)
– Creates a ML model that recognizes periodicity, can do forecasting
– Anomaly detection, visually identify anomalies using heatmap
– Simple ML jobs
> based on 1 metric
– Multi metric ML jobs:
> split a single time series into multiple time series based on a categorical field.
INTELLIGENT ALERTS
10

What's hot

[WSO2Con USA 2018] Deploying Applications in K8S and DockerWSO2

Enhancing Kubernetes with Autoscaling & Hybrid Cloud IaaSMatt Baldwin

CDK Meetup: Rule the World through IaCsmalltown

AWS re:Invent re:Cap 2019: My ElasticSearch Journey on AWSsmalltown

Implementing an Automated Staging EnvironmentDaniel Oliveira Filho

Ceilometer Updates - Kilo EditionOpenStack Foundation

Web後端技術的演變inwin stack

Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...HostedbyConfluent

OSDC 2018 | Monitoring Kubernetes at Scale by Monica SarbuNETWAYS

Persist your data in an ephemeral k8 ecosystemLibbySchulze

Heat Updates - Liberty EditionOpenStack Foundation

Nova Updates - Kilo EditionOpenStack Foundation

Glance Updates - Liberty EditionOpenStack Foundation

Container Management - Federico Simoncelli - ManageIQ Design Summit 2016ManageIQ

Kubernetes User Group: 維運 Kubernetes 的兩三事smalltown

A Microservices approach with Cassandra and Quarkus | DevNation Tech TalkRed Hat Developers

Getting Started with Kafka on k8sVMware Tanzu

19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s worldDávid Kőszeghy

Serverless stream processing of Debezium data change events with Knative | De...Red Hat Developers

The evolving container landscapeNilesh Trivedi

What's hot (20)

[WSO2Con USA 2018] Deploying Applications in K8S and Docker

Enhancing Kubernetes with Autoscaling & Hybrid Cloud IaaS

CDK Meetup: Rule the World through IaC

AWS re:Invent re:Cap 2019: My ElasticSearch Journey on AWS

Implementing an Automated Staging Environment

Ceilometer Updates - Kilo Edition

Web後端技術的演變

Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...

OSDC 2018 | Monitoring Kubernetes at Scale by Monica Sarbu

Persist your data in an ephemeral k8 ecosystem

Heat Updates - Liberty Edition

Nova Updates - Kilo Edition

Glance Updates - Liberty Edition

Container Management - Federico Simoncelli - ManageIQ Design Summit 2016

Kubernetes User Group: 維運 Kubernetes 的兩三事

A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk

Getting Started with Kafka on k8s

19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world

Serverless stream processing of Debezium data change events with Knative | De...

The evolving container landscape

Similar to 4 - Customer story: Telenet

OVHcloud – Enterprise Cloud DatabasesOVHcloud

Best Practices for Building Robust Data Platform with Apache Spark and DeltaDatabricks

Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lightbend

Taking Splunk to the Next Level - Architecture Breakout SessionSplunk

Centralized log-management-with-elastic-stackRich Lee

Introducing Cloudian HyperStore 6.0Cloudian

Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...VMware Tanzu

Galera webinar migration to galera cluster from my sql async replicationCodership Oy - Creators of Galera Cluster

Securing Hadoop @eBayDataWorks Summit

Getting Started with SplunkSplunk

Using a Fast Operational Database to Build Real-time Streaming AggregationsVoltDB

NoCOUG_201411_Patel_Managing_a_Large_OLTP_DatabaseParesh Patel

Présentation ELK/SIEM et démo WazuhAurélie Henriot

Taking Splunk to the Next Level - Architecture Breakout SessionSplunk

Building Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EUYaron Haviv

10 Tips for Your Journey to the Public CloudIntuit Inc.

Aerospike AdTech Gets Hacked in Lower ManhattanAerospike

You Snooze You Lose or How to Win in Ad Tech?Aerospike, Inc.

Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Amazon Web Services

Real-Time Health Score Application using Apache Spark on KubernetesDatabricks

Similar to 4 - Customer story: Telenet (20)

OVHcloud – Enterprise Cloud Databases

Best Practices for Building Robust Data Platform with Apache Spark and Delta

Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...

Taking Splunk to the Next Level - Architecture Breakout Session

Centralized log-management-with-elastic-stack

Introducing Cloudian HyperStore 6.0

Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...

Galera webinar migration to galera cluster from my sql async replication

Securing Hadoop @eBay

Getting Started with Splunk

Using a Fast Operational Database to Build Real-time Streaming Aggregations

NoCOUG_201411_Patel_Managing_a_Large_OLTP_Database

Présentation ELK/SIEM et démo Wazuh

Taking Splunk to the Next Level - Architecture Breakout Session

Building Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EU

10 Tips for Your Journey to the Public Cloud

Aerospike AdTech Gets Hacked in Lower Manhattan

You Snooze You Lose or How to Win in Ad Tech?

Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv

Real-Time Health Score Application using Apache Spark on Kubernetes

Recently uploaded

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Advanced Computer Architecture – An IntroductionDilum Bandara

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Story boards and shot lists for my a level piececharlottematthew16

Commit 2024 - Secret Management made easyAlfredo García Lavilla

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

How to write a Business Continuity PlanDatabarracks

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Gen AI in Business - Global Trends Report 2024.pdfAddepto

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Recently uploaded (20)

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf

SAP Build Work Zone - Overview L2-L3.pptx

Streamlining Python Development: A Guide to a Modern Project Setup

WordPress Websites for Engineers: Elevate Your Brand

Scanning the Internet for External Cloud Exposures via SSL Certs

Advanced Computer Architecture – An Introduction

Unraveling Multimodality with Large Language Models.pdf

Story boards and shot lists for my a level piece

Commit 2024 - Secret Management made easy

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

Artificial intelligence in cctv survelliance.pptx

Developer Data Modeling Mistakes: From Postgres to NoSQL

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day

Designing IA for AI - Information Architecture Conference 2024

How to write a Business Continuity Plan

Advanced Test Driven-Development @ php[tek] 2024

DSPy a system for AI to Write Prompts and Do Fine Tuning

What's New in Teams Calling, Meetings and Devices March 2024

Gen AI in Business - Global Trends Report 2024.pdf

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

4 - Customer story: Telenet

1. ELASTICSEARCH @OPEN19 14 May 2019

2. § Why Elastic? § Use cases: – Elasticsearch for troubleshooting – Elasticsearch for trending (metrics) – Monitoring ELK stack § Setup – Implementation diagram – Details – HW migration strategy – Numbers – Alerting – Intelligent alerts OUTLINE 2

3. – Central location of logs – To allow easier troubleshooting of infrastructure/apps – No need to login to different systems to check the logs > keep logs longer then allowed by local diskspace on app servers – Implementation > Partnered with Kangaroot for design/implementation – Using Ansible for deployment/upgrades > Entreprise support via Elastic WHY ELASTIC 3

4. § Log analysis of F5 access logs – Graphs/Alerts on average response times for web apps – Heavily used by Operations § VMware logs – vCenter logs for auditing reasons (Oracle licensing) – when ESXi crashes you might lose your logs § Network & storage device logs § Kafka broker monitoring – {metric,file}beat § Monitoring Elastic itself – Logstash filebeat, Elastic nodes, Kibana nodes, Elastic cluster health § Application logs for developers to allow easier troubleshooting – Weblogic, Tomcat, JBoss/WildFly, AEM, … § Generate alerts towards entreprise monitoring solution using watches § Replacement of GSA with a custom API with Elastic backend USE CASES 4

5. 5 § ELK implementation diagram Shipper Shipper Indexer Indexer Indexer

6. § Logstash – Shipper layer uses 1 pipeline – Index layer uses multiple pipelines > Grok filters for parsing logfiles, need some logging standards Alternative > Use native json logging format – Monitoring via x-pack > destination: Elastic monitoring cluster § Kafka – Monitoring using filebeat/metricbeat > destination: Elastic cluster, bypassing Logstash/Kafka § Kibana – Using coordination-only node – Loadbalance queries across Elastic nodes DETAILS 6

7. § Setup new independent cluster on new HW (master nodes, data nodes, kibana) § Setup new logstash indexer layer using a unique group_id (different kafka consumer_id) § Migrate index patterns, existing roles, index templates, visualizations & dashboards, watches § data sources need no modification § Data is ingested to both clusters – Allows for testing new Hardware without impact on current cluster – data migration of older data if needed using snapshot/restore – Minimal to no data migration by running in parallel for time of data retention – Once done => switch Kibana VIP from old Kibana to new Kibana instance HW MIGRATION STRATEGY 7

8. § PRD cluster – 7 physical warm datanodes – 3 physical hot datanodes – 3 dedicated virtual master nodes § Currently running version 6.5 § Retention: – 30-days of data for infrastructure related logs – 3 weeks of data for application logs – Few months for metrics § Current replicated datavolume: 32TB § Roughly 850 GB/day incoming logs § 7000 events/s for F5 access logs => daily replicated volume: 500 to 600 Gb/day § 3200 events/s for VMware logs => daily replicated volume: 350 Gb/day § 500 events/s for Metricbeat => monthly replicated volume: 400 Gb NUMBERS 8

9. § WATCHER – Input > Search (Elastic query) > Http request – Trigger > Time based: when to execute watcher (e.g. every 5min) – Condition > When to execute action against – Action to take if condition is met > log message to file > send e-mail > notification to Chat tool (e.g. Slack) > Call to Webhook ALERTING 9

10. § Alerts are typically static – E.g. cpu usage should be below 90%, response times should be below 0.5s – Not aware of periodicity, e.g. billing cycle, weekends, … § Enter machine learning (ML) – Creates a ML model that recognizes periodicity, can do forecasting – Anomaly detection, visually identify anomalies using heatmap – Simple ML jobs > based on 1 metric – Multi metric ML jobs: > split a single time series into multiple time series based on a categorical field. INTELLIGENT ALERTS 10

11. THANK YOU

4 - Customer story: Telenet

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 4 - Customer story: Telenet

Similar to 4 - Customer story: Telenet (20)

More from Kangaroot

More from Kangaroot (20)

Recently uploaded

Recently uploaded (20)

4 - Customer story: Telenet