SlideShare a Scribd company logo
1 of 49
Download to read offline
M O N I T O R I N G A N D L O G G I N G 

I N W O N D E R L A N D
H E L P, W H AT I S H A P P E N I N G ?
PA U L S E I F F E RT
Team Leader at Jimdo,

Traveller, Foodie, Runner
@seiffertp

paul.seiffert@gmail.com
W O N D E R L A N D
• Jimdo’s internal PaaS that runs 250 services
• 2500 Docker containers at a time
• 600 deployments per Day
W O N D E R L A N D
W O N D E R L A N D
AW S
O T H E R S E R V I C E
P R O V I D E R S
I N F R A S T R U C T U R E A U T O M AT I O N
A P I S
M O N I T O R I N G , 

L O G G I N G
C L I T O O L S
WONDERLAND
O T H E R T O O L I N G
W O N D E R L A N D
W O N D E R L A N D
A P I
AW S E C S
E C S
A G E N T
L O G G I N G 

D A E M O N
M E T R I C 

D A E M O N
EC2

Instance
• Your team is responsible for the software component
that delivers websites of 20m customers
• You are on-call this night
I M A G I N E …
4 : 0 0 A M
4 : 0 1 A M
4 : 0 1 A M
Partial outage of

web delivery component
• either because a health check failed
• or because a metric exceeded a configured threshold
PA G E R D U T Y C A L L S
H E A LT H
C H E C K S
A L E RT 

M A N A G E R
P R O M E T H E U S
• All services on Wonderland: Route53 health checks
• Infrastructure components: Pingdom checks
A P I H E A LT H C H E C K S
GET /health

HTTP/1.1 200 OK
• Workers notify a health check service after each execution
• Prometheus pushgateway
• cronitor.io
• healthchecks.io
• If not notified for a certain time an alert is created
W O R K E R H E A LT H C H E C K S
Run tests against production periodically,

monitor results, and alert on issues
S E M A N T I C M O N I T O R I N G
S Y N T H E T I C M O N I T O R I N G
4 : 1 0 A M
Service still running
S E R V I C E D A S H B O A R D
G R A FA N A
• Each service running on Wonderland automatically has a
dashboard showing key metrics for debugging
• Developers can create custom dashboards for more detailed
analysis
• Grafana pulls data from Prometheus instances
P R O M E T H E U S
• Semi-centralized metric system
• Pull-based metric retrieval
• On-the-fly calculation of derived metrics
M E T R I C S
I N F R A S T R U C T U R E M E T R I C S
S Y S T E M M E T R I C S
A P P L I C AT I O N M E T R I C S
I N F R A S T R U C T U R E M E T R I C S
P R O M E T H E U S
C L O U D WAT C H
E X P O RT E R
AW S
C U S T O M
E X P O RT E R S
W O N D E R L A N D
A P I S
E X A M P L E S
aws_autoscaling_group_desired_capacity_average{
auto_scaling_group_name="crims",

job="cloudwatch_exporter"

}
aws_elb_request_count_sum{

cluster=“crims",

job="wonderland_elb_exporter",

service_name="web-prod"

}
S Y S T E M M E T R I C S
P R O M E T H E U S
C O L L E C T D
C A D V I S O R
E X A M P L E S
container_memory_rss{

container_label_cluster="crims",

container_label_container_name="web-prod--web",

image="web-prod:abc123",

instance="10.8.4.91:9104",

job=“crims_cadvisor_metrics"

}
collectd_memory{

instance="10.8.4.42:9103",

job="crims_collectd_metrics",

memory="free"

}
A P P L I C AT I O N M E T R I C S
P R O M E T H E U S
C O N TA I N E R A
C O N TA I N E R B
…
GET /metrics
P R O M E T H E U S
C O N TA I N E R A
C O N TA I N E R B
…
W O N D E R L A N D
S E R V I C E
D I S C O V E RY
W O N D E R L A N D
A P I
update

config
locate



containers
scrape

metrics
and

reload
S E R V I C E D I S C O V E RY
D O W N L O A D E R
get scrape

targets
M E T R I C R E T E N T I O N
http_requests_total{instance=“10.8.3.101:80”} = 53

http_requests_total{instance=“10.8.3.102:80”} = 81

http_requests_total{instance=“10.8.3.103:80”} = 2
...
job:http_requests_total:sum = sum(http_requests_total) without (instance)
Automatically generated recording rules:
L O N G - T E R M -
P R O M E T H E U S
S H O RT- T E R M 

P R O M E T H E U S
scrape



filtered metrics
'match[]':
- '{job="application_metrics", instance=""}'
32
DAYS
30
MIN
F E D E R AT I O N
L O N G - T E R M -
P R O M E T H E U S
S H O RT- T E R M 

P R O M E T H E U S
scrape



filtered metrics
http_requests_total{instance=“10.8.3.101:80”}

http_requests_total{instance=“10.8.3.102:80”}

http_requests_total{instance=“10.8.3.103:80”}

...

job:http_requests_total:sum{}
job:http_requests_total:sum{}
S E R V I C E D A S H B O A R D
4 : 1 2 A M
Auto-Scaling broken
L E T ’ S TA K E A L O O K AT
T H E L O G S
• Centralised logging is a must-have in a distributed
system
• It should be very easy to gather all information that
concerns a service
C E N T R A L I S E D L O G G I N G
• Output of all services running on Wonderland is stored
centrally
• Optionally logs are parsed with configurable formats
C E N T R A L I S E D L O G G I N G
$ cat wonderland.yaml

---
components:
- name
image: my-nginx-image
logging:
types:
- access_log
- error_log_nginx
C E N T R A L I S E D L O G G I N G
D O C K E R L O G B E AT L O G Z . I O
fluentd



protocol
lumberjack



protocol
Wonderland Logbeat
• receives logs via fluent protocol,
• parses them,
• adds metadata,
• and streams them to our logging provider logz.io
T H E T R U T H
D O C K E R L O G B E AT L O G Z . I O
fluentd



protocol
lumberjack



protocol
T H E T R U T H
D O C K E R
L O G B E AT L O G Z . I O
fluentd
lumberjack
D O C K E R L O G -
S T R E A M
PA P E RT R A I L .
C O M
syslog
We are in a migration right now.
4 : 1 7 A M
You find this log message of the service
autoscaler:
Unable to scale-out service “web-
delivery”. Configured maximum number
of instances reached.
4 : 1 7 A M
You increase the maximum number of
instances:
$ cat wonderland.yaml 

[…]

auto-scaling:

min-instances: 60

max-instances: 150
4 : 2 0 A M
Back to bed
2 : 0 0 P M
In the PMA for this night’s incident, you create the
action item to
Monitor the number of instances of web-delivery
to detect potential breaches of auto-scaling limits
before affecting the system’s health
Q U E S T I O N S ?
T H A N K Y O U
Open positions:
• Senior Infrastructure Engineer
• Senior Backend Engineer
• Senior Frontend Engineer
jobs@jimdo.com
F U RT H E R R E A D I N G / S O U R C E S
• Beyer, Jones, Petoff & Murphy

Site Reliability Engineering
• Susan Fowler

Production-Ready Microservices
• Sam Newman

Building Microservices
• Stripe / Increment

On-Call (https://increment.com/on-call/)
• Mathias Lafeldt & Paul Seiffert

A Journey Through Wonderland

(https://speakerdeck.com/mlafeldt/a-journey-through-wonderland)
F O T O S
• Marcel Stockmann

https://www.flickr.com/photos/marcelstockmann/33068471286
• Michael Theis

https://www.flickr.com/photos/huskyte/6931056896

More Related Content

What's hot

Admins: Smoke Test Your Hadoop Cluster!
Admins: Smoke Test Your Hadoop Cluster!Admins: Smoke Test Your Hadoop Cluster!
Admins: Smoke Test Your Hadoop Cluster!Michael Arnold
 
ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013
ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013
ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013Christian Hallqvist
 
Huhdoop?: Uncertain Data Management on Non-Relational Database Systems
Huhdoop?: Uncertain Data Management on Non-Relational Database SystemsHuhdoop?: Uncertain Data Management on Non-Relational Database Systems
Huhdoop?: Uncertain Data Management on Non-Relational Database SystemsJeff Smith
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014StampedeCon
 
Вячеслав Крюков, Ivinco
Вячеслав Крюков, IvincoВячеслав Крюков, Ivinco
Вячеслав Крюков, IvincoOntico
 

What's hot (8)

Elapsed time
Elapsed timeElapsed time
Elapsed time
 
Admins: Smoke Test Your Hadoop Cluster!
Admins: Smoke Test Your Hadoop Cluster!Admins: Smoke Test Your Hadoop Cluster!
Admins: Smoke Test Your Hadoop Cluster!
 
ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013
ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013
ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013
 
Writing nagios plugins in perl
Writing nagios plugins in perlWriting nagios plugins in perl
Writing nagios plugins in perl
 
Huhdoop?: Uncertain Data Management on Non-Relational Database Systems
Huhdoop?: Uncertain Data Management on Non-Relational Database SystemsHuhdoop?: Uncertain Data Management on Non-Relational Database Systems
Huhdoop?: Uncertain Data Management on Non-Relational Database Systems
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014
 
Вячеслав Крюков, Ivinco
Вячеслав Крюков, IvincoВячеслав Крюков, Ivinco
Вячеслав Крюков, Ivinco
 
Log
LogLog
Log
 

Similar to Monitoring and Logging in Wonderland

Meteor - not just for rockstars
Meteor - not just for rockstarsMeteor - not just for rockstars
Meteor - not just for rockstarsStephan Hochhaus
 
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]New Relic
 
Strangler Pattern in practice @PHPers Day 2019
Strangler Pattern in practice @PHPers Day 2019Strangler Pattern in practice @PHPers Day 2019
Strangler Pattern in practice @PHPers Day 2019Michał Kurzeja
 
Synthetic and rum webinar
Synthetic and rum webinarSynthetic and rum webinar
Synthetic and rum webinarSOASTA
 
Synthetic and RUM: A Recipe for Web Performance Success
Synthetic and RUM: A Recipe for Web Performance SuccessSynthetic and RUM: A Recipe for Web Performance Success
Synthetic and RUM: A Recipe for Web Performance SuccessSOASTA
 
Angular server side rendering with NodeJS - In Pursuit Of Speed
Angular server side rendering with NodeJS - In Pursuit Of SpeedAngular server side rendering with NodeJS - In Pursuit Of Speed
Angular server side rendering with NodeJS - In Pursuit Of SpeedIlia Idakiev
 
Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]New Relic
 
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
Building a Data Ingestion & Processing Pipeline with Spark & AirflowBuilding a Data Ingestion & Processing Pipeline with Spark & Airflow
Building a Data Ingestion & Processing Pipeline with Spark & AirflowTom Lous
 
New Era of Software with modern Application Security v1.0
New Era of Software with modern Application Security v1.0New Era of Software with modern Application Security v1.0
New Era of Software with modern Application Security v1.0Dinis Cruz
 
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHPphp[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHPAdam Englander
 
PAC 2020 Santorin - Gopalkrishnan Yadav
PAC 2020 Santorin - Gopalkrishnan YadavPAC 2020 Santorin - Gopalkrishnan Yadav
PAC 2020 Santorin - Gopalkrishnan YadavNeotys
 
Data Scientist's Daily Life
Data Scientist's Daily LifeData Scientist's Daily Life
Data Scientist's Daily LifeBryan Yang
 
PAC 2020 Santorin - Joerek Van Gaalen
PAC 2020 Santorin - Joerek Van GaalenPAC 2020 Santorin - Joerek Van Gaalen
PAC 2020 Santorin - Joerek Van GaalenNeotys
 
4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz Kowalczewski4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz KowalczewskiPROIDEA
 
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWSPuppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWSjohnpainter_id_au
 
Zend con 2016 - Asynchronous Prorgamming in PHP
Zend con 2016 - Asynchronous Prorgamming in PHPZend con 2016 - Asynchronous Prorgamming in PHP
Zend con 2016 - Asynchronous Prorgamming in PHPAdam Englander
 

Similar to Monitoring and Logging in Wonderland (20)

Meteor WWNRW Intro
Meteor WWNRW IntroMeteor WWNRW Intro
Meteor WWNRW Intro
 
Meteor - not just for rockstars
Meteor - not just for rockstarsMeteor - not just for rockstars
Meteor - not just for rockstars
 
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
 
Strangler Pattern in practice @PHPers Day 2019
Strangler Pattern in practice @PHPers Day 2019Strangler Pattern in practice @PHPers Day 2019
Strangler Pattern in practice @PHPers Day 2019
 
Deployments in one click!
Deployments in one click!Deployments in one click!
Deployments in one click!
 
Synthetic and rum webinar
Synthetic and rum webinarSynthetic and rum webinar
Synthetic and rum webinar
 
Synthetic and RUM: A Recipe for Web Performance Success
Synthetic and RUM: A Recipe for Web Performance SuccessSynthetic and RUM: A Recipe for Web Performance Success
Synthetic and RUM: A Recipe for Web Performance Success
 
Angular server side rendering with NodeJS - In Pursuit Of Speed
Angular server side rendering with NodeJS - In Pursuit Of SpeedAngular server side rendering with NodeJS - In Pursuit Of Speed
Angular server side rendering with NodeJS - In Pursuit Of Speed
 
Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]
 
Everybody Lies
Everybody LiesEverybody Lies
Everybody Lies
 
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
Building a Data Ingestion & Processing Pipeline with Spark & AirflowBuilding a Data Ingestion & Processing Pipeline with Spark & Airflow
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
 
New Era of Software with modern Application Security v1.0
New Era of Software with modern Application Security v1.0New Era of Software with modern Application Security v1.0
New Era of Software with modern Application Security v1.0
 
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHPphp[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
 
PAC 2020 Santorin - Gopalkrishnan Yadav
PAC 2020 Santorin - Gopalkrishnan YadavPAC 2020 Santorin - Gopalkrishnan Yadav
PAC 2020 Santorin - Gopalkrishnan Yadav
 
Data Scientist's Daily Life
Data Scientist's Daily LifeData Scientist's Daily Life
Data Scientist's Daily Life
 
PAC 2020 Santorin - Joerek Van Gaalen
PAC 2020 Santorin - Joerek Van GaalenPAC 2020 Santorin - Joerek Van Gaalen
PAC 2020 Santorin - Joerek Van Gaalen
 
4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz Kowalczewski4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz Kowalczewski
 
Measure to fail
Measure to failMeasure to fail
Measure to fail
 
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWSPuppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWS
 
Zend con 2016 - Asynchronous Prorgamming in PHP
Zend con 2016 - Asynchronous Prorgamming in PHPZend con 2016 - Asynchronous Prorgamming in PHP
Zend con 2016 - Asynchronous Prorgamming in PHP
 

Recently uploaded

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 

Recently uploaded (20)

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 

Monitoring and Logging in Wonderland

  • 1. M O N I T O R I N G A N D L O G G I N G 
 I N W O N D E R L A N D H E L P, W H AT I S H A P P E N I N G ?
  • 2. PA U L S E I F F E RT Team Leader at Jimdo,
 Traveller, Foodie, Runner @seiffertp
 paul.seiffert@gmail.com
  • 3.
  • 4. W O N D E R L A N D
  • 5. • Jimdo’s internal PaaS that runs 250 services • 2500 Docker containers at a time • 600 deployments per Day W O N D E R L A N D
  • 6. W O N D E R L A N D AW S O T H E R S E R V I C E P R O V I D E R S I N F R A S T R U C T U R E A U T O M AT I O N A P I S M O N I T O R I N G , 
 L O G G I N G C L I T O O L S WONDERLAND O T H E R T O O L I N G
  • 7. W O N D E R L A N D W O N D E R L A N D A P I AW S E C S E C S A G E N T L O G G I N G 
 D A E M O N M E T R I C 
 D A E M O N EC2
 Instance
  • 8. • Your team is responsible for the software component that delivers websites of 20m customers • You are on-call this night I M A G I N E …
  • 9. 4 : 0 0 A M
  • 10. 4 : 0 1 A M
  • 11. 4 : 0 1 A M Partial outage of
 web delivery component
  • 12. • either because a health check failed • or because a metric exceeded a configured threshold PA G E R D U T Y C A L L S
  • 13. H E A LT H C H E C K S A L E RT 
 M A N A G E R P R O M E T H E U S
  • 14. • All services on Wonderland: Route53 health checks • Infrastructure components: Pingdom checks A P I H E A LT H C H E C K S GET /health
 HTTP/1.1 200 OK
  • 15. • Workers notify a health check service after each execution • Prometheus pushgateway • cronitor.io • healthchecks.io • If not notified for a certain time an alert is created W O R K E R H E A LT H C H E C K S
  • 16. Run tests against production periodically,
 monitor results, and alert on issues S E M A N T I C M O N I T O R I N G S Y N T H E T I C M O N I T O R I N G
  • 17. 4 : 1 0 A M Service still running
  • 18. S E R V I C E D A S H B O A R D
  • 19. G R A FA N A • Each service running on Wonderland automatically has a dashboard showing key metrics for debugging • Developers can create custom dashboards for more detailed analysis • Grafana pulls data from Prometheus instances
  • 20. P R O M E T H E U S • Semi-centralized metric system • Pull-based metric retrieval • On-the-fly calculation of derived metrics
  • 21. M E T R I C S I N F R A S T R U C T U R E M E T R I C S S Y S T E M M E T R I C S A P P L I C AT I O N M E T R I C S
  • 22. I N F R A S T R U C T U R E M E T R I C S P R O M E T H E U S C L O U D WAT C H E X P O RT E R AW S C U S T O M E X P O RT E R S W O N D E R L A N D A P I S
  • 23. E X A M P L E S aws_autoscaling_group_desired_capacity_average{ auto_scaling_group_name="crims",
 job="cloudwatch_exporter"
 } aws_elb_request_count_sum{
 cluster=“crims",
 job="wonderland_elb_exporter",
 service_name="web-prod"
 }
  • 24. S Y S T E M M E T R I C S P R O M E T H E U S C O L L E C T D C A D V I S O R
  • 25. E X A M P L E S container_memory_rss{
 container_label_cluster="crims",
 container_label_container_name="web-prod--web",
 image="web-prod:abc123",
 instance="10.8.4.91:9104",
 job=“crims_cadvisor_metrics"
 } collectd_memory{
 instance="10.8.4.42:9103",
 job="crims_collectd_metrics",
 memory="free"
 }
  • 26. A P P L I C AT I O N M E T R I C S P R O M E T H E U S C O N TA I N E R A C O N TA I N E R B … GET /metrics
  • 27. P R O M E T H E U S C O N TA I N E R A C O N TA I N E R B … W O N D E R L A N D S E R V I C E D I S C O V E RY W O N D E R L A N D A P I update
 config locate
 
 containers scrape
 metrics and
 reload S E R V I C E D I S C O V E RY D O W N L O A D E R get scrape
 targets
  • 28. M E T R I C R E T E N T I O N
  • 29. http_requests_total{instance=“10.8.3.101:80”} = 53
 http_requests_total{instance=“10.8.3.102:80”} = 81
 http_requests_total{instance=“10.8.3.103:80”} = 2 ... job:http_requests_total:sum = sum(http_requests_total) without (instance) Automatically generated recording rules:
  • 30. L O N G - T E R M - P R O M E T H E U S S H O RT- T E R M 
 P R O M E T H E U S scrape
 
 filtered metrics 'match[]': - '{job="application_metrics", instance=""}' 32 DAYS 30 MIN F E D E R AT I O N
  • 31. L O N G - T E R M - P R O M E T H E U S S H O RT- T E R M 
 P R O M E T H E U S scrape
 
 filtered metrics http_requests_total{instance=“10.8.3.101:80”}
 http_requests_total{instance=“10.8.3.102:80”}
 http_requests_total{instance=“10.8.3.103:80”}
 ...
 job:http_requests_total:sum{} job:http_requests_total:sum{}
  • 32. S E R V I C E D A S H B O A R D
  • 33. 4 : 1 2 A M Auto-Scaling broken
  • 34. L E T ’ S TA K E A L O O K AT T H E L O G S
  • 35. • Centralised logging is a must-have in a distributed system • It should be very easy to gather all information that concerns a service C E N T R A L I S E D L O G G I N G
  • 36. • Output of all services running on Wonderland is stored centrally • Optionally logs are parsed with configurable formats C E N T R A L I S E D L O G G I N G $ cat wonderland.yaml
 --- components: - name image: my-nginx-image logging: types: - access_log - error_log_nginx
  • 37. C E N T R A L I S E D L O G G I N G D O C K E R L O G B E AT L O G Z . I O fluentd
 
 protocol lumberjack
 
 protocol Wonderland Logbeat • receives logs via fluent protocol, • parses them, • adds metadata, • and streams them to our logging provider logz.io
  • 38.
  • 39. T H E T R U T H D O C K E R L O G B E AT L O G Z . I O fluentd
 
 protocol lumberjack
 
 protocol
  • 40. T H E T R U T H D O C K E R L O G B E AT L O G Z . I O fluentd lumberjack D O C K E R L O G - S T R E A M PA P E RT R A I L . C O M syslog We are in a migration right now.
  • 41. 4 : 1 7 A M You find this log message of the service autoscaler: Unable to scale-out service “web- delivery”. Configured maximum number of instances reached.
  • 42. 4 : 1 7 A M You increase the maximum number of instances: $ cat wonderland.yaml 
 […]
 auto-scaling:
 min-instances: 60
 max-instances: 150
  • 43. 4 : 2 0 A M Back to bed
  • 44. 2 : 0 0 P M In the PMA for this night’s incident, you create the action item to Monitor the number of instances of web-delivery to detect potential breaches of auto-scaling limits before affecting the system’s health
  • 45. Q U E S T I O N S ?
  • 46. T H A N K Y O U
  • 47. Open positions: • Senior Infrastructure Engineer • Senior Backend Engineer • Senior Frontend Engineer jobs@jimdo.com
  • 48. F U RT H E R R E A D I N G / S O U R C E S • Beyer, Jones, Petoff & Murphy
 Site Reliability Engineering • Susan Fowler
 Production-Ready Microservices • Sam Newman
 Building Microservices • Stripe / Increment
 On-Call (https://increment.com/on-call/) • Mathias Lafeldt & Paul Seiffert
 A Journey Through Wonderland
 (https://speakerdeck.com/mlafeldt/a-journey-through-wonderland)
  • 49. F O T O S • Marcel Stockmann
 https://www.flickr.com/photos/marcelstockmann/33068471286 • Michael Theis
 https://www.flickr.com/photos/huskyte/6931056896