Telenet was looking for a centralisation of their logs to make them easier searchable and allow easier troubleshooting for their infrastructure. The've partnered up with Kangaroot for the design & implementation and enjoy Enterprise Support via Elastic. During this session, you'll find out how they've started up with this project.
3. – Central location of logs
– To allow easier troubleshooting of infrastructure/apps
– No need to login to different systems to check the logs
> keep logs longer then allowed by local diskspace on app servers
– Implementation
> Partnered with Kangaroot for design/implementation
– Using Ansible for deployment/upgrades
> Entreprise support via Elastic
WHY ELASTIC
3
4. § Log analysis of F5 access logs
– Graphs/Alerts on average response times for web
apps
– Heavily used by Operations
§ VMware logs
– vCenter logs for auditing reasons (Oracle
licensing)
– when ESXi crashes you might lose your logs
§ Network & storage device logs
§ Kafka broker monitoring
– {metric,file}beat
§ Monitoring Elastic itself
– Logstash filebeat, Elastic nodes, Kibana nodes,
Elastic cluster health
§ Application logs for developers to allow easier
troubleshooting
– Weblogic, Tomcat, JBoss/WildFly, AEM, …
§ Generate alerts towards entreprise monitoring
solution using watches
§ Replacement of GSA with a custom API with
Elastic backend
USE CASES
4
6. § Logstash
– Shipper layer uses 1 pipeline
– Index layer uses multiple pipelines
> Grok filters for parsing logfiles, need some logging standards
Alternative
> Use native json logging format
– Monitoring via x-pack
> destination: Elastic monitoring cluster
§ Kafka
– Monitoring using filebeat/metricbeat
> destination: Elastic cluster, bypassing Logstash/Kafka
§ Kibana
– Using coordination-only node
– Loadbalance queries across Elastic nodes
DETAILS
6
7. § Setup new independent cluster on new HW (master nodes, data nodes, kibana)
§ Setup new logstash indexer layer using a unique group_id (different kafka consumer_id)
§ Migrate index patterns, existing roles, index templates, visualizations & dashboards, watches
§ data sources need no modification
§ Data is ingested to both clusters
– Allows for testing new Hardware without impact on current cluster
– data migration of older data if needed using snapshot/restore
– Minimal to no data migration by running in parallel for time of data retention
– Once done => switch Kibana VIP from old Kibana to new Kibana instance
HW MIGRATION STRATEGY
7
8. § PRD cluster
– 7 physical warm datanodes
– 3 physical hot datanodes
– 3 dedicated virtual master nodes
§ Currently running version 6.5
§ Retention:
– 30-days of data for infrastructure related logs
– 3 weeks of data for application logs
– Few months for metrics
§ Current replicated datavolume: 32TB
§ Roughly 850 GB/day incoming logs
§ 7000 events/s for F5 access logs => daily replicated volume: 500 to 600 Gb/day
§ 3200 events/s for VMware logs => daily replicated volume: 350 Gb/day
§ 500 events/s for Metricbeat => monthly replicated volume: 400 Gb
NUMBERS
8
9. § WATCHER
– Input
> Search (Elastic query)
> Http request
– Trigger
> Time based: when to execute watcher (e.g. every 5min)
– Condition
> When to execute action against
– Action to take if condition is met
> log message to file
> send e-mail
> notification to Chat tool (e.g. Slack)
> Call to Webhook
ALERTING
9
10. § Alerts are typically static
– E.g. cpu usage should be below 90%, response times should be below 0.5s
– Not aware of periodicity, e.g. billing cycle, weekends, …
§ Enter machine learning (ML)
– Creates a ML model that recognizes periodicity, can do forecasting
– Anomaly detection, visually identify anomalies using heatmap
– Simple ML jobs
> based on 1 metric
– Multi metric ML jobs:
> split a single time series into multiple time series based on a categorical field.
INTELLIGENT ALERTS
10