SlideShare a Scribd company logo
Renzo Tomà, bol.com
1
How bol.com makes sense of its logs,
using the Elastic technology stack.
How bol.com makes sense of its logs,
using the Elastic technology stack.
2
Renzo Tomà, Oct. 29 2015
• Renzo Tomà
• IT Operations engineer at bol.com, large webshop in the Netherlands and Belgium
• Product owner & tech lead for 2 platforms: metrics & logsearch
• Opensource user + contributor
• Husband and dad of 2 cool kids!
3
Please to meet you
ELK powers a Logsearch platform (“grep on steroids”).
Log events from many layers of our infrastructure.
Central user interface for querying: Kibana.
For software developers, system engineers & our security team (~300 potential users).
Supports development & operations co-op (sharing Kibana dashboards = 1 truth).
Bottomline: faster incident resolution = less revenue loss.
4
bol.com & ELK
ELK is a 1st class citizen, since datacenter rebuild go-live in 2014.
Getting feeds from:
• 3 datacenters
• 5 frontend apps, 80+ services
• lots of databases
Log types: Apache and Tomcat access logging, Log4j, PostgreSQL, Oracle, syslog, …
Numbers:
• 1600+ servers emitting log events
• 500-600 million events per day, indexing peaks at 25k/sec
• 23 billion events stored, 14TB * 2 on disk
• We keep 90 days available for search.
5
ELK as 1st class citizen
6
Our high level design
7
Great, but how do those events get into Redis?
In 2013: tail files & ship lines to Logstash over UDP. Lots of grokking.
Logstash (1 instance) unable to process feed in real time => data loss, incomplete events.
Need for speed & simplicity!
• Scale Logstash instances. Use Redis as message bus, to feed multiple Logstash instances.
• Reduce need for complex grok. Format events in a structured format.
In 2015: events get converted into JSON docs at the source. Our shippers run inside JVMs and DBs.
Logstash reads from Redis and decodes events. No more grokking.
Logstash out of work? No. Cleanup, enrichment (IP geo location) and metrics generations (lag, throughput).
8
Struggles in log shipping
Application server access logging (Tomcat):
Inside Tomcat: convert ‘hits’ into JSON doc and send to Redis: https://github.com/bolcom/redis-log-valve
Java application logging (Log4j):
Inside JVM: convert events into JSON doc and send to Redis:
https://github.com/bolcom/log4j-jsonevent-layout + https://github.com/bolcom/log4j-redis-appender
Webserver access logging (Apache):
• Custom LogFormat to output ‘hit’ as JSON: http://untergeek.com/2013/09/11/getting-apache-to-output-json-for-logstash-1-2-x/
• Apache sends JSON docs to external process, which sends to Redis.
Docker logging:
Shipper container: subscribes to logs for all running containers, convert events into JSON doc and send to
Redis:https://github.com/bolcom/logspout-redis-logstash
Oracle logging:
Inside database: custom PL/SQL package with API, creates JSON docs and send to Redis.
PostgreSQL logging:
Inside database: hooks into logging, convert events into JSON doc and send to Redis: https://github.com/2ndquadrant-it/redislog
9
The logshippers we use
Each Webshop request gets tagged with Request ID.
Webshop is connected to 25 services. Request ID gets attached to all service calls.
It gets logged in many places.
Correlation time!
Search for a Request ID and see:
• initial Webshop request
• all service calls made
Including: order, parameters,
status codes and responsetimes.
10
Special sauce 1/2: the call stack
We have 5 frontend application and 80+ services. Services calling services.
New services get introduced. New connections are made. Canary releases. A/B testing…
Its a living distributed architecture.
We need a map, we can trust!
Let’s build a directed graph.
• Use the Tomcat access logging
• Add “A called B” information
• Elasticsearch aggregation query
• Transform the result and draw graph
11
Special sauce 2/2: the service map
Event emitted for every request a Tomcat Java application processes:
12
Tomcat access log events
{
"@message": ”/v1/get-product/987654321”,
"@source_host": ”pro-catalog-001",
"@fields": {
"agent": "curl/7.43.0",
"role": ”catalog",
"verb": "GET",
"time_in_msec": 2,
"response": 200,
"bytes": 75,
"client": ”10.0.0.1",
"httpversion": "HTTP/1.1",
"time_in_sec": 0,
"timestamp": 1443101965498
}
}
We create a lookup table for our whole datacenter IP space:
“10.0.0.1”: “webshop”
“10.0.0.2”: “catalog”
…
Add new field, using Logstash ‘translate’ filter:
translate {
dictionary_path => ‘ip-to-role-mapping.yaml’
field => ‘client’
destination => ‘client_role’
}
That’s all we need.
13
Enrich events with external data
{
"@message": ”/v1/get-product/987654321”,
"@source_host": ”pro-catalog-001",
"@fields": {
"agent": "curl/7.43.0",
"role": ”catalog",
"verb": "GET",
"time_in_msec": 2,
"response": 200,
"bytes": 75,
"client": ”10.0.0.1",
”client_role": ”webshop",
"httpversion": "HTTP/1.1",
"time_in_sec": 0,
"timestamp": 1443101965498
}
}
14
Searching & transforming
# search query
{
"size": 0,
"query": { … },
"aggs": {
"_apps_": {
"terms": {"field": "role"},
"aggs": {
"_clients_": {
"terms": {"field": "client_role"},
}
}
}
}
}
# search result
{
"hits": { … },
"aggregations": {
"_apps_": {
"buckets": [
{
"_clients_": {
"buckets": [
{
"key": ”catalog",
"doc_count": 1234,
},
…
],
"key": “webshop",
…
}
}
],
}
}
}
# dot file
digraph {
node [shape=box];
“webshop" -> “catalog" [label=1234];
"abc" -> "foo" [label=42];
"foo" -> "bar" [label=13];
…
}
15
That makes sense! (Sort of …)
Names have been obfuscated. Sorry.
16
That makes sense!
Renzo Tomà
rtoma@bol.com
Thanks!

More Related Content

What's hot

ELK Wrestling (Leeds DevOps)
ELK Wrestling (Leeds DevOps)ELK Wrestling (Leeds DevOps)
ELK Wrestling (Leeds DevOps)
Steve Elliott
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Andrii Vozniuk
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
Rohit Sharma
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
ForgeRock
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELK
Geert Pante
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupStartit
 
Elk devops
Elk devopsElk devops
Elk devops
Ideato
 
"How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics."How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics.
Vladimir Pavkin
 
Graylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog Engineering - Design Your Architecture
Graylog Engineering - Design Your Architecture
Graylog
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
SpringPeople
 
ELK introduction
ELK introductionELK introduction
ELK introduction
Waldemar Neto
 
Log aggregation and analysis
Log aggregation and analysisLog aggregation and analysis
Log aggregation and analysis
Dhaval Mehta
 
Monitoring Docker with ELK
Monitoring Docker with ELKMonitoring Docker with ELK
Monitoring Docker with ELK
Daniel Berman
 
Open Source Logging and Monitoring Tools
Open Source Logging and Monitoring ToolsOpen Source Logging and Monitoring Tools
Open Source Logging and Monitoring Tools
Phase2
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
Chris Riccomini
 
Logstash family introduction
Logstash family introductionLogstash family introduction
Logstash family introduction
Owen Wu
 
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, KibanaLogging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Md Safiyat Reza
 
Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?
inovex GmbH
 

What's hot (20)

ELK Wrestling (Leeds DevOps)
ELK Wrestling (Leeds DevOps)ELK Wrestling (Leeds DevOps)
ELK Wrestling (Leeds DevOps)
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELK
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
 
Logstash
LogstashLogstash
Logstash
 
Elk devops
Elk devopsElk devops
Elk devops
 
"How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics."How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics.
 
Graylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog Engineering - Design Your Architecture
Graylog Engineering - Design Your Architecture
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 
ELK introduction
ELK introductionELK introduction
ELK introduction
 
Log aggregation and analysis
Log aggregation and analysisLog aggregation and analysis
Log aggregation and analysis
 
Monitoring Docker with ELK
Monitoring Docker with ELKMonitoring Docker with ELK
Monitoring Docker with ELK
 
Open Source Logging and Monitoring Tools
Open Source Logging and Monitoring ToolsOpen Source Logging and Monitoring Tools
Open Source Logging and Monitoring Tools
 
elk_stack_alexander_szalonnas
elk_stack_alexander_szalonnaselk_stack_alexander_szalonnas
elk_stack_alexander_szalonnas
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
Logstash family introduction
Logstash family introductionLogstash family introduction
Logstash family introduction
 
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, KibanaLogging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
 
Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?
 

Similar to How bol.com makes sense of its logs, using the Elastic technology stack.

The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
Josef Adersberger
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
QAware GmbH
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
DigitalOcean
 
How to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackHow to run a bank on Apache CloudStack
How to run a bank on Apache CloudStack
gjdevos
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
Jakub Hajek
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
PROIDEA
 
Instrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyInstrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with Envoy
Daniel Hochman
 
High Volume Payments using Mule
High Volume Payments using MuleHigh Volume Payments using Mule
High Volume Payments using Mule
Adhish Pendharkar
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
SolarWinds Loggly
 
Otimizando servidores web
Otimizando servidores webOtimizando servidores web
Otimizando servidores web
Amazon Web Services LATAM
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
GetInData
 
Elk ruminating on logs
Elk ruminating on logsElk ruminating on logs
Elk ruminating on logs
Mathew Beane
 
Cashing in on logging and exception data
Cashing in on logging and exception dataCashing in on logging and exception data
Cashing in on logging and exception data
Stackify
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileIVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
Amazon Web Services Japan
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
amesar0
 
Meetup callback
Meetup callbackMeetup callback
Meetup callback
Wayne Scarano
 
2019 10-21 Java in the Age of Serverless
2019 10-21 Java in the Age of Serverless2019 10-21 Java in the Age of Serverless
2019 10-21 Java in the Age of Serverless
Matt Rutkowski
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
Treasure Data, Inc.
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Redis Labs
 

Similar to How bol.com makes sense of its logs, using the Elastic technology stack. (20)

The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
 
How to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackHow to run a bank on Apache CloudStack
How to run a bank on Apache CloudStack
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Instrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyInstrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with Envoy
 
High Volume Payments using Mule
High Volume Payments using MuleHigh Volume Payments using Mule
High Volume Payments using Mule
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
Otimizando servidores web
Otimizando servidores webOtimizando servidores web
Otimizando servidores web
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Elk ruminating on logs
Elk ruminating on logsElk ruminating on logs
Elk ruminating on logs
 
Cashing in on logging and exception data
Cashing in on logging and exception dataCashing in on logging and exception data
Cashing in on logging and exception data
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileIVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
Meetup callback
Meetup callbackMeetup callback
Meetup callback
 
2019 10-21 Java in the Age of Serverless
2019 10-21 Java in the Age of Serverless2019 10-21 Java in the Age of Serverless
2019 10-21 Java in the Age of Serverless
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 

Recently uploaded

一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
VivekSinghShekhawat2
 

Recently uploaded (20)

一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
 

How bol.com makes sense of its logs, using the Elastic technology stack.

  • 1. Renzo Tomà, bol.com 1 How bol.com makes sense of its logs, using the Elastic technology stack.
  • 2. How bol.com makes sense of its logs, using the Elastic technology stack. 2 Renzo Tomà, Oct. 29 2015
  • 3. • Renzo Tomà • IT Operations engineer at bol.com, large webshop in the Netherlands and Belgium • Product owner & tech lead for 2 platforms: metrics & logsearch • Opensource user + contributor • Husband and dad of 2 cool kids! 3 Please to meet you
  • 4. ELK powers a Logsearch platform (“grep on steroids”). Log events from many layers of our infrastructure. Central user interface for querying: Kibana. For software developers, system engineers & our security team (~300 potential users). Supports development & operations co-op (sharing Kibana dashboards = 1 truth). Bottomline: faster incident resolution = less revenue loss. 4 bol.com & ELK
  • 5. ELK is a 1st class citizen, since datacenter rebuild go-live in 2014. Getting feeds from: • 3 datacenters • 5 frontend apps, 80+ services • lots of databases Log types: Apache and Tomcat access logging, Log4j, PostgreSQL, Oracle, syslog, … Numbers: • 1600+ servers emitting log events • 500-600 million events per day, indexing peaks at 25k/sec • 23 billion events stored, 14TB * 2 on disk • We keep 90 days available for search. 5 ELK as 1st class citizen
  • 7. 7 Great, but how do those events get into Redis?
  • 8. In 2013: tail files & ship lines to Logstash over UDP. Lots of grokking. Logstash (1 instance) unable to process feed in real time => data loss, incomplete events. Need for speed & simplicity! • Scale Logstash instances. Use Redis as message bus, to feed multiple Logstash instances. • Reduce need for complex grok. Format events in a structured format. In 2015: events get converted into JSON docs at the source. Our shippers run inside JVMs and DBs. Logstash reads from Redis and decodes events. No more grokking. Logstash out of work? No. Cleanup, enrichment (IP geo location) and metrics generations (lag, throughput). 8 Struggles in log shipping
  • 9. Application server access logging (Tomcat): Inside Tomcat: convert ‘hits’ into JSON doc and send to Redis: https://github.com/bolcom/redis-log-valve Java application logging (Log4j): Inside JVM: convert events into JSON doc and send to Redis: https://github.com/bolcom/log4j-jsonevent-layout + https://github.com/bolcom/log4j-redis-appender Webserver access logging (Apache): • Custom LogFormat to output ‘hit’ as JSON: http://untergeek.com/2013/09/11/getting-apache-to-output-json-for-logstash-1-2-x/ • Apache sends JSON docs to external process, which sends to Redis. Docker logging: Shipper container: subscribes to logs for all running containers, convert events into JSON doc and send to Redis:https://github.com/bolcom/logspout-redis-logstash Oracle logging: Inside database: custom PL/SQL package with API, creates JSON docs and send to Redis. PostgreSQL logging: Inside database: hooks into logging, convert events into JSON doc and send to Redis: https://github.com/2ndquadrant-it/redislog 9 The logshippers we use
  • 10. Each Webshop request gets tagged with Request ID. Webshop is connected to 25 services. Request ID gets attached to all service calls. It gets logged in many places. Correlation time! Search for a Request ID and see: • initial Webshop request • all service calls made Including: order, parameters, status codes and responsetimes. 10 Special sauce 1/2: the call stack
  • 11. We have 5 frontend application and 80+ services. Services calling services. New services get introduced. New connections are made. Canary releases. A/B testing… Its a living distributed architecture. We need a map, we can trust! Let’s build a directed graph. • Use the Tomcat access logging • Add “A called B” information • Elasticsearch aggregation query • Transform the result and draw graph 11 Special sauce 2/2: the service map
  • 12. Event emitted for every request a Tomcat Java application processes: 12 Tomcat access log events { "@message": ”/v1/get-product/987654321”, "@source_host": ”pro-catalog-001", "@fields": { "agent": "curl/7.43.0", "role": ”catalog", "verb": "GET", "time_in_msec": 2, "response": 200, "bytes": 75, "client": ”10.0.0.1", "httpversion": "HTTP/1.1", "time_in_sec": 0, "timestamp": 1443101965498 } }
  • 13. We create a lookup table for our whole datacenter IP space: “10.0.0.1”: “webshop” “10.0.0.2”: “catalog” … Add new field, using Logstash ‘translate’ filter: translate { dictionary_path => ‘ip-to-role-mapping.yaml’ field => ‘client’ destination => ‘client_role’ } That’s all we need. 13 Enrich events with external data { "@message": ”/v1/get-product/987654321”, "@source_host": ”pro-catalog-001", "@fields": { "agent": "curl/7.43.0", "role": ”catalog", "verb": "GET", "time_in_msec": 2, "response": 200, "bytes": 75, "client": ”10.0.0.1", ”client_role": ”webshop", "httpversion": "HTTP/1.1", "time_in_sec": 0, "timestamp": 1443101965498 } }
  • 14. 14 Searching & transforming # search query { "size": 0, "query": { … }, "aggs": { "_apps_": { "terms": {"field": "role"}, "aggs": { "_clients_": { "terms": {"field": "client_role"}, } } } } } # search result { "hits": { … }, "aggregations": { "_apps_": { "buckets": [ { "_clients_": { "buckets": [ { "key": ”catalog", "doc_count": 1234, }, … ], "key": “webshop", … } } ], } } } # dot file digraph { node [shape=box]; “webshop" -> “catalog" [label=1234]; "abc" -> "foo" [label=42]; "foo" -> "bar" [label=13]; … }
  • 15. 15 That makes sense! (Sort of …) Names have been obfuscated. Sorry.