SlideShare a Scribd company logo
1 of 103
Download to read offline
Мониторинг
облачной CI системы
на примере Jenkins
Alexander Akbashev
HERE Technologies
Here Technologies
HERE Technologies, the Open Location Platform company, enables
people, enterprises and cities to harness the power of location. By
making sense of the world through the lens of location we empower
our customers to achieve better outcomes – from helping a city
manage its infrastructure or an enterprise optimize its assets to
guiding drivers to their destination safely.
To learn more about HERE, including our new generation of cloud-
based location platform services, visit http://
360.here.com and www.here.com
Context
• Every change goes through pre-submit validation
• Feedback time is 15-40 minutes
• A lot of products and platforms
• 6 Jenkins masters
• Up to 185k runs per day in the biggest one
• 20k runs per day in average
if something goes wrong…
What can go wrong?
Compilation is broken
Tests are broken
Network issues
What can go wrong?
Compilation is broken
Tests are broken
Network issues
Jenkins master crashed
EC2 plugin does not raise new nodes
No connection to labs
Can not cleanup workspace
AWS S3 is down
Git master dies
Git replica is broken
Compiler cache was invalidated
Hit the limit of API calls to AWS
Job was deleted
UI is blocked
Queue is too big
System.exit(1)
NFS stuck
Deadlock in Jenkins
Staging started to give feedback
Restarted the wrong server
What can go wrong?
Compilation is broken
Tests are broken
Network issues
Jenkins master crashed
EC2 plugin does not raise new nodes
No connection to labs
Can not cleanup workspace
AWS S3 is down
Git master dies
Git replica is broken
Compiler cache was invalidated
Hit the limit of API calls to AWS
Job was deleted
UI is blocked
Queue is too big
System.exit(1)
NFS stuck
Deadlock in Jenkins
Staging started to give feedback
Restarted the wrong server
Monitoring Jenkins
Out of the box
Monitoring Jenkins
© http://www.jenkinselectric.com/monitoring
Monitoring Jenkins
https://jenkins.io/doc/book/system-administration/monitoring/
Monitoring Jenkins
https://wiki.jenkins.io/display/JENKINS/Monitoring
Monitoring Plugin (March 2016)
Monitoring Plugin (March 2016)
+ Easy to install
Monitoring Plugin (March 2016)
+ Easy to install
+ Nothing to maintain
Monitoring Plugin (March 2016)
+ Easy to install
+ Nothing to maintain
- Jenkins is slow - no monitoring
Monitoring Plugin (March 2016)
+ Easy to install
+ Nothing to maintain
- Jenkins is slow - no monitoring
- Monitors mainly JVM stats
Monitoring Plugin (March 2016)
+ Easy to install
+ Nothing to maintain
- Jenkins is slow - no monitoring
- Monitors mainly JVM stats
- Only one instance
Monitoring Plugin (March 2016)
+ Easy to install
+ Nothing to maintain
- Jenkins is slow - no monitoring
- Monitors mainly JVM stats
- Only one instance
- Not scalable
Monitoring Plugin (nowadays)
+ Easy to install
+ Nothing to maintain
- Jenkins is slow - no monitoring
- Monitors mainly JVM stats
- Only one instance
- Not scalable
+ InfluxDB/CloudWatch/Graphite
Let’s craft own monitoring!
Design own monitoring (March 2016)
Jenkins Python InfluxDB
API API
Design own monitoring (March 2016)
Jenkins Python InfluxDB
import influxdb
import jenkins
j = Jenkins(“jenkins.host”)
queue_info = j.get_queue_info()
for q in queue_info:
influx_server.push({“name”: q[‘job_name’],
“reason”: q[‘why’]})
API API
Design own monitoring (March 2016)
Jenkins Python InfluxDB
import influxdb
import jenkins
j = Jenkins(“jenkins.host”)
queue_info = j.get_queue_info()
for q in queue_info:
influx_server.push({“name”: q[‘job_name’],
“reason”: q[‘why’]})
API API
Design own monitoring (March 2016)
Jenkins Python InfluxDB
import influxdb
import jenkins
j = Jenkins(“jenkins.host”)
queue_info = j.get_queue_info()
for q in queue_info:
influx_server.push({“name”: q[‘job_name’],
“reason”: q[‘why’]})
API API
Design own monitoring (March 2016)
Jenkins Python InfluxDB
API API
Design own monitoring (March 2016)
Jenkins Python InfluxDB
+simple
API API
Design own monitoring (March 2016)
Jenkins Python InfluxDB
+simple
+worked for 18 months
API API
Design own monitoring (March 2016)
Jenkins Python InfluxDB
+simple
+worked for 18 months
- polling
API API
Design own monitoring (March 2016)
Jenkins Python InfluxDB
+simple
+worked for 18 months
- polling
- maintain common code
API API
Design own monitoring (March 2016)
Jenkins Python InfluxDB
+simple
+worked for 18 months
- polling
- maintain common code
- not all data is accessible
API API
Design own monitoring (March 2016)
Jenkins Python InfluxDB
+simple
+worked for 18 months
- polling
- maintain common code
- not all data is accessible
- extra load
API API
Design own monitoring (March 2016)
Jenkins Python InfluxDB
+simple
+worked for 18 months
- polling
- maintain common code
- not all data is accessible
- extra load
API API
Let’s do event based
monitoring!
Jenkins Core
public abstract class RunListener<R extends Run> implements
ExtensionPoint {
public void onCompleted(R r, TaskListener listener) {}



public void onFinalized(R r) {}



public void onStarted(R r, TaskListener listener) {}
public void onDeleted(R r) {}
}
Jenkins Core
public abstract class RunListener<R extends Run> implements
ExtensionPoint {
public void onCompleted(R r, TaskListener listener) {}



public void onFinalized(R r) {}



public void onStarted(R r, TaskListener listener) {}
public void onDeleted(R r) {}
}
Groovy Event Listener Plugin (April 2016)
• Allows to execute custom groovy code for every event
• Supports RunListener
Groovy Event Listener Plugin (nowadays)
• Allows to execute custom groovy code for every event
• Supports RunListener, ComputerListener, ItemListener,
QueueListener
• Works at scale
• Allows custom classpath
Groovy Event Listener Plugin
if (event == 'RunListener.onFinalized') {
def build = Thread.currentThread().executable
def queueAction = build.getAction(TimeInQueueAction.class)
def queuing = queueAction.getQueuingDurationMillis()
log.info “number=$build.number, queue_duration=$queuing
}
Ok, we have events, but how
to fill the db?
FluentD
FluentD
• Process 13,000 events/second/core
FluentD
• Process 13,000 events/second/core
• Retry/buffer/routing
FluentD
• Process 13,000 events/second/core
• Retry/buffer/routing
• Easy to extend
FluentD
• Process 13,000 events/second/core
• Retry/buffer/routing
• Easy to extend
• Simple
FluentD
• Process 13,000 events/second/core
• Retry/buffer/routing
• Easy to extend
• Simple
• Reliable
FluentD
• Process 13,000 events/second/core
• Retry/buffer/routing
• Easy to extend
• Simple
• Reliable
• Memory footprint is 30-40MB
FluentD
• Process 13,000 events/second/core
• Retry/buffer/routing
• Easy to extend
• Simple
• Reliable
• Memory footprint is 30-40MB
• Ruby
FluentD
Jenkins FluentD InfluxDB
JSON JSON
FluentD
Jenkins FluentD InfluxDB
JSON JSON
Postgres
SQL
FluentD
Jenkins FluentD InfluxDB
JSON JSON
Postgres
SQL
Logs
FluentD. Config.
<match **.influx.**>
type influxdb
host influxdb.host
port 8086
dbname stats
auto_tags “true”
timestamp_tag timestamp
time_precision s
</match>
FluentD. Config.
<match **.influx.**>
type influxdb
host influxdb.host
port 8086
dbname stats
auto_tags “true”
timestamp_tag timestamp
time_precision s
</match>
FluentD. Config.
<match **.influx.**>
type influxdb
host influxdb.host
port 8086
dbname stats
auto_tags “true”
timestamp_tag timestamp
time_precision s
</match>
FluentD. Config.
<match **.influx.**>
type influxdb
host influxdb.host
port 8086
dbname stats
auto_tags “true”
timestamp_tag timestamp
time_precision s
</match>
FluentD. Config.
<match **.influx.**>
type influxdb
host influxdb.host
port 8086
dbname stats
auto_tags “true”
timestamp_tag timestamp
time_precision s
</match>
Ok, we have events, we have
fluentd, but how to pass event
to it?
FluentD Plugin for Jenkins
FluentD Plugin for Jenkins
• Developed in HERE
Technologies
FluentD Plugin for Jenkins
• Developed in HERE
Technologies
• Very simple
FluentD Plugin for Jenkins
• Developed in HERE
Technologies
• Very simple
• Supports JSON
FluentD Plugin for Jenkins
• Developed in HERE
Technologies
• Very simple
• Supports JSON
• Post-build-step
FluentD Plugin for Jenkins
https://github.com/jenkinsci/fluentd-plugin
Great! Let’s do something with
this data!
Infra issues
Build Failure Analyzer (config)
Build Failure Analyzer (code)
def bfa = build.getAction(FailureCauseBuildAction.class)
def causes = bfa.getFailureCauseDisplayData().getFoundFailureCauses()
for(def cause : causes) {
final Map<String, Object> data = new HashMap<>();
data.put("name", jobName)
data.put("number", build.number)
data.put("cause", cause.getName())
data.put("categories", cause.getCategories().join(','))
data.put("timestamp", build.timestamp.timeInMillis)
data.put("node", node)
context.logger.log("influx.bfa", data)
}
Build Failure Analyzer (code)
def bfa = build.getAction(FailureCauseBuildAction.class)
def causes = bfa.getFailureCauseDisplayData().getFoundFailureCauses()
for(def cause : causes) {
final Map<String, Object> data = new HashMap<>();
data.put("name", jobName)
data.put("number", build.number)
data.put("cause", cause.getName())
data.put("categories", cause.getCategories().join(','))
data.put("timestamp", build.timestamp.timeInMillis)
data.put("node", node)
context.logger.log("influx.bfa", data)
}
Build Failure Analyzer (code)
def bfa = build.getAction(FailureCauseBuildAction.class)
def causes = bfa.getFailureCauseDisplayData().getFoundFailureCauses()
for(def cause : causes) {
final Map<String, Object> data = new HashMap<>();
data.put("name", jobName)
data.put("number", build.number)
data.put("cause", cause.getName())
data.put("categories", cause.getCategories().join(','))
data.put("timestamp", build.timestamp.timeInMillis)
data.put("node", node)
context.logger.log("influx.bfa", data)
}
Build Failure Analyzer (code)
def bfa = build.getAction(FailureCauseBuildAction.class)
def causes = bfa.getFailureCauseDisplayData().getFoundFailureCauses()
for(def cause : causes) {
final Map<String, Object> data = new HashMap<>();
data.put("name", jobName)
data.put("number", build.number)
data.put("cause", cause.getName())
data.put("categories", cause.getCategories().join(','))
data.put("timestamp", build.timestamp.timeInMillis)
data.put("node", node)
context.logger.log("influx.bfa", data)
}
Build Failure Analyzer (code)
def bfa = build.getAction(FailureCauseBuildAction.class)
def causes = bfa.getFailureCauseDisplayData().getFoundFailureCauses()
for(def cause : causes) {
final Map<String, Object> data = new HashMap<>();
data.put("name", jobName)
data.put("number", build.number)
data.put("cause", cause.getName())
data.put("categories", cause.getCategories().join(','))
data.put("timestamp", build.timestamp.timeInMillis)
data.put("node", node)
context.logger.log("influx.bfa", data)
}
Build Failure Analyzer (code)
def bfa = build.getAction(FailureCauseBuildAction.class)
def causes = bfa.getFailureCauseDisplayData().getFoundFailureCauses()
for(def cause : causes) {
final Map<String, Object> data = new HashMap<>();
data.put("name", jobName)
data.put("number", build.number)
data.put("cause", cause.getName())
data.put("categories", cause.getCategories().join(','))
data.put("timestamp", build.timestamp.timeInMillis)
data.put("node", node)
context.logger.log("influx.bfa", data)
}
Build Failure Analyzer (result)
Speed up compilation
CCache (problem)
CCache
CCache
• New node - empty local cache
CCache
• New node - empty local cache
• Old local cache - a lot of misses
CCache
• New node - empty local cache
• Old local cache - a lot of misses
+ Distributed cache solves all this problems
CCache
• New node - empty local cache
• Old local cache - a lot of misses
+ Distributed cache solves all this problems
- Once a year distributes problem across the
cluster
CCache (result)
Improve node utilization
LoadBalancer (problem)
LoadBalancer (solution)
LoadBalancer (solution)
• Default balancer is optimized for cache
LoadBalancer (solution)
• Default balancer is optimized for cache
• Cron jobs are pinned to different hosts
LoadBalancer (solution)
• Default balancer is optimized for cache
• Cron jobs are pinned to different hosts
• Nothing to terminate/stop - no idle nodes
LoadBalancer (solution)
• Default balancer is optimized for cache
• Cron jobs are pinned to different hosts
• Nothing to terminate/stop - no idle nodes
+ Saturate Node Load Balancer: always put all load to the oldest
node
LoadBalancer (result)
Minimize impact
Jar Hell (problem)
java.io.InvalidClassException: hudson.util.StreamTaskListener;
local class incompatible: stream classdesc serialVersionUID = 1,
local class serialVersionUID = 294073340889094580
Jar Hell (explanation)
Jar Hell (explanation)
• Bug in Jenkins Remoting Layer
Jar Hell (explanation)
• Bug in Jenkins Remoting Layer
• If first run that is using some class is aborted - this class is “lost”
Jar Hell (explanation)
• Bug in Jenkins Remoting Layer
• If first run that is using some class is aborted - this class is “lost”
• Does not recover
Jar Hell (explanation)
• Bug in Jenkins Remoting Layer
• If first run that is using some class is aborted - this class is “lost”
• Does not recover
• Huge impact
Jar Hell (“solution”)
if (cause.getName().equals("Jar Hell”)) {
Node node = build.getBuiltOn()
if (node != Jenkins.getInstance()) {
node.setLabelString("disabled_jar_hell");
}
Our daily dashboard
Resources
Resources
• FluentD
• Influxdb plugin for fluentd
• JavaGC plugin for fluentd
• FluentD Plugin
• Groovy Event Listener Plugin
• Build Failure Analyzer Plugin
• Saturate Node Load Balancer Plugin
• CCache with memcache
• InfluxDB
Q/A?
alexander.akbashev@here.com
Github: Jimilian

More Related Content

What's hot

Akka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming WorldAkka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming WorldKonrad Malawski
 
Heroku for team collaboration
Heroku for team collaborationHeroku for team collaboration
Heroku for team collaborationJohn Stevenson
 
Akka Streams in Action @ ScalaDays Berlin 2016
Akka Streams in Action @ ScalaDays Berlin 2016Akka Streams in Action @ ScalaDays Berlin 2016
Akka Streams in Action @ ScalaDays Berlin 2016Konrad Malawski
 
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeSimon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeFlink Forward
 
How Reactive Streams & Akka Streams change the JVM Ecosystem
How Reactive Streams & Akka Streams change the JVM EcosystemHow Reactive Streams & Akka Streams change the JVM Ecosystem
How Reactive Streams & Akka Streams change the JVM EcosystemKonrad Malawski
 
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...Fasten Project
 
Running tests for every commit: Gerrit, Jenkins, Docker, AWS
Running tests for every commit: Gerrit, Jenkins, Docker, AWSRunning tests for every commit: Gerrit, Jenkins, Docker, AWS
Running tests for every commit: Gerrit, Jenkins, Docker, AWSAlexander Akbashev
 
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)Ontico
 
Continuous Integration on Steroids
Continuous Integration on SteroidsContinuous Integration on Steroids
Continuous Integration on SteroidsAlexander Akbashev
 
[Japanese] How Reactive Streams and Akka Streams change the JVM Ecosystem @ R...
[Japanese] How Reactive Streams and Akka Streams change the JVM Ecosystem @ R...[Japanese] How Reactive Streams and Akka Streams change the JVM Ecosystem @ R...
[Japanese] How Reactive Streams and Akka Streams change the JVM Ecosystem @ R...Konrad Malawski
 
Nanog75, Network Device Property as Code
Nanog75, Network Device Property as CodeNanog75, Network Device Property as Code
Nanog75, Network Device Property as CodeDamien Garros
 
ATLRUG Announcements and Fun Facts - April 2016
ATLRUG Announcements and Fun Facts - April 2016ATLRUG Announcements and Fun Facts - April 2016
ATLRUG Announcements and Fun Facts - April 2016jasnow
 
Automating OWASP ZAP - DevCSecCon talk
Automating OWASP ZAP - DevCSecCon talk Automating OWASP ZAP - DevCSecCon talk
Automating OWASP ZAP - DevCSecCon talk Simon Bennetts
 
Elk ruminating on logs
Elk ruminating on logsElk ruminating on logs
Elk ruminating on logsMathew Beane
 
Jenkins vs. AWS CodePipeline
Jenkins vs. AWS CodePipelineJenkins vs. AWS CodePipeline
Jenkins vs. AWS CodePipelineSteffen Gebert
 
SplunkSummit 2015 - HTTP Event Collector, Simplified Developer Logging
SplunkSummit 2015 - HTTP Event Collector, Simplified Developer LoggingSplunkSummit 2015 - HTTP Event Collector, Simplified Developer Logging
SplunkSummit 2015 - HTTP Event Collector, Simplified Developer LoggingSplunk
 
Nginx performance monitoring with Dynatrace
Nginx performance monitoring with DynatraceNginx performance monitoring with Dynatrace
Nginx performance monitoring with DynatraceHarald Zeitlhofer
 
Supercharging CI/CD with GitLab and Rancher - June 2017 Online Meetup
Supercharging CI/CD with GitLab and Rancher - June 2017 Online MeetupSupercharging CI/CD with GitLab and Rancher - June 2017 Online Meetup
Supercharging CI/CD with GitLab and Rancher - June 2017 Online MeetupShannon Williams
 

What's hot (20)

Akka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming WorldAkka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming World
 
Heroku for team collaboration
Heroku for team collaborationHeroku for team collaboration
Heroku for team collaboration
 
Akka Streams in Action @ ScalaDays Berlin 2016
Akka Streams in Action @ ScalaDays Berlin 2016Akka Streams in Action @ ScalaDays Berlin 2016
Akka Streams in Action @ ScalaDays Berlin 2016
 
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeSimon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
 
How Reactive Streams & Akka Streams change the JVM Ecosystem
How Reactive Streams & Akka Streams change the JVM EcosystemHow Reactive Streams & Akka Streams change the JVM Ecosystem
How Reactive Streams & Akka Streams change the JVM Ecosystem
 
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
 
Automatic codefixes
Automatic codefixesAutomatic codefixes
Automatic codefixes
 
Running tests for every commit: Gerrit, Jenkins, Docker, AWS
Running tests for every commit: Gerrit, Jenkins, Docker, AWSRunning tests for every commit: Gerrit, Jenkins, Docker, AWS
Running tests for every commit: Gerrit, Jenkins, Docker, AWS
 
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
 
Continuous Integration on Steroids
Continuous Integration on SteroidsContinuous Integration on Steroids
Continuous Integration on Steroids
 
[Japanese] How Reactive Streams and Akka Streams change the JVM Ecosystem @ R...
[Japanese] How Reactive Streams and Akka Streams change the JVM Ecosystem @ R...[Japanese] How Reactive Streams and Akka Streams change the JVM Ecosystem @ R...
[Japanese] How Reactive Streams and Akka Streams change the JVM Ecosystem @ R...
 
Nanog75, Network Device Property as Code
Nanog75, Network Device Property as CodeNanog75, Network Device Property as Code
Nanog75, Network Device Property as Code
 
ATLRUG Announcements and Fun Facts - April 2016
ATLRUG Announcements and Fun Facts - April 2016ATLRUG Announcements and Fun Facts - April 2016
ATLRUG Announcements and Fun Facts - April 2016
 
Automating OWASP ZAP - DevCSecCon talk
Automating OWASP ZAP - DevCSecCon talk Automating OWASP ZAP - DevCSecCon talk
Automating OWASP ZAP - DevCSecCon talk
 
Elk ruminating on logs
Elk ruminating on logsElk ruminating on logs
Elk ruminating on logs
 
Jenkins vs. AWS CodePipeline
Jenkins vs. AWS CodePipelineJenkins vs. AWS CodePipeline
Jenkins vs. AWS CodePipeline
 
Testing at Stream-Scale
Testing at Stream-ScaleTesting at Stream-Scale
Testing at Stream-Scale
 
SplunkSummit 2015 - HTTP Event Collector, Simplified Developer Logging
SplunkSummit 2015 - HTTP Event Collector, Simplified Developer LoggingSplunkSummit 2015 - HTTP Event Collector, Simplified Developer Logging
SplunkSummit 2015 - HTTP Event Collector, Simplified Developer Logging
 
Nginx performance monitoring with Dynatrace
Nginx performance monitoring with DynatraceNginx performance monitoring with Dynatrace
Nginx performance monitoring with Dynatrace
 
Supercharging CI/CD with GitLab and Rancher - June 2017 Online Meetup
Supercharging CI/CD with GitLab and Rancher - June 2017 Online MeetupSupercharging CI/CD with GitLab and Rancher - June 2017 Online Meetup
Supercharging CI/CD with GitLab and Rancher - June 2017 Online Meetup
 

Similar to Мониторинг облачной CI системы на примере Jenkins

Serverless in Production, an experience report (AWS UG South Wales)
Serverless in Production, an experience report (AWS UG South Wales)Serverless in Production, an experience report (AWS UG South Wales)
Serverless in Production, an experience report (AWS UG South Wales)Yan Cui
 
Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (FullStack 2018)Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (FullStack 2018)Yan Cui
 
Continuous Delivery - Devoxx Morocco 2016
Continuous Delivery - Devoxx Morocco 2016Continuous Delivery - Devoxx Morocco 2016
Continuous Delivery - Devoxx Morocco 2016Rafał Leszko
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Demi Ben-Ari
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Codemotion
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Demi Ben-Ari
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Demi Ben-Ari
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Codemotion
 
Continuous Delivery - Voxxed Days Thessaloniki 21.10.2016
Continuous Delivery - Voxxed Days Thessaloniki 21.10.2016Continuous Delivery - Voxxed Days Thessaloniki 21.10.2016
Continuous Delivery - Voxxed Days Thessaloniki 21.10.2016Rafał Leszko
 
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016Cloud Native Day Tel Aviv
 
Serverless in production, an experience report
Serverless in production, an experience reportServerless in production, an experience report
Serverless in production, an experience reportYan Cui
 
Neotys PAC 2018 - Jonathon Wright
Neotys PAC 2018 - Jonathon WrightNeotys PAC 2018 - Jonathon Wright
Neotys PAC 2018 - Jonathon WrightNeotys_Partner
 
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer ToolsDevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer ToolsAmazon Web Services
 
Continuous Delivery - Voxxed Days Cluj-Napoca 2017
Continuous Delivery - Voxxed Days Cluj-Napoca 2017Continuous Delivery - Voxxed Days Cluj-Napoca 2017
Continuous Delivery - Voxxed Days Cluj-Napoca 2017Rafał Leszko
 
The future of paas is serverless
The future of paas is serverlessThe future of paas is serverless
The future of paas is serverlessYan Cui
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016Matthew Broberg
 
The Usual Suspects - Red Hat Developer Day 2012-11-01
The Usual Suspects - Red Hat Developer Day 2012-11-01The Usual Suspects - Red Hat Developer Day 2012-11-01
The Usual Suspects - Red Hat Developer Day 2012-11-01Jorge Hidalgo
 
Puppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard RomanusPuppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard RomanusPuppet
 
NDC 2011 - Let me introduce my Moncai
NDC 2011 - Let me introduce my MoncaiNDC 2011 - Let me introduce my Moncai
NDC 2011 - Let me introduce my Moncaimoncai
 
Riga Dev Day - Automated Android Continuous Integration
Riga Dev Day - Automated Android Continuous IntegrationRiga Dev Day - Automated Android Continuous Integration
Riga Dev Day - Automated Android Continuous IntegrationNicolas Fränkel
 

Similar to Мониторинг облачной CI системы на примере Jenkins (20)

Serverless in Production, an experience report (AWS UG South Wales)
Serverless in Production, an experience report (AWS UG South Wales)Serverless in Production, an experience report (AWS UG South Wales)
Serverless in Production, an experience report (AWS UG South Wales)
 
Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (FullStack 2018)Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (FullStack 2018)
 
Continuous Delivery - Devoxx Morocco 2016
Continuous Delivery - Devoxx Morocco 2016Continuous Delivery - Devoxx Morocco 2016
Continuous Delivery - Devoxx Morocco 2016
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 
Continuous Delivery - Voxxed Days Thessaloniki 21.10.2016
Continuous Delivery - Voxxed Days Thessaloniki 21.10.2016Continuous Delivery - Voxxed Days Thessaloniki 21.10.2016
Continuous Delivery - Voxxed Days Thessaloniki 21.10.2016
 
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
 
Serverless in production, an experience report
Serverless in production, an experience reportServerless in production, an experience report
Serverless in production, an experience report
 
Neotys PAC 2018 - Jonathon Wright
Neotys PAC 2018 - Jonathon WrightNeotys PAC 2018 - Jonathon Wright
Neotys PAC 2018 - Jonathon Wright
 
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer ToolsDevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
 
Continuous Delivery - Voxxed Days Cluj-Napoca 2017
Continuous Delivery - Voxxed Days Cluj-Napoca 2017Continuous Delivery - Voxxed Days Cluj-Napoca 2017
Continuous Delivery - Voxxed Days Cluj-Napoca 2017
 
The future of paas is serverless
The future of paas is serverlessThe future of paas is serverless
The future of paas is serverless
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016
 
The Usual Suspects - Red Hat Developer Day 2012-11-01
The Usual Suspects - Red Hat Developer Day 2012-11-01The Usual Suspects - Red Hat Developer Day 2012-11-01
The Usual Suspects - Red Hat Developer Day 2012-11-01
 
Puppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard RomanusPuppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard Romanus
 
NDC 2011 - Let me introduce my Moncai
NDC 2011 - Let me introduce my MoncaiNDC 2011 - Let me introduce my Moncai
NDC 2011 - Let me introduce my Moncai
 
Riga Dev Day - Automated Android Continuous Integration
Riga Dev Day - Automated Android Continuous IntegrationRiga Dev Day - Automated Android Continuous Integration
Riga Dev Day - Automated Android Continuous Integration
 

Recently uploaded

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 

Recently uploaded (20)

Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 

Мониторинг облачной CI системы на примере Jenkins

  • 1. Мониторинг облачной CI системы на примере Jenkins Alexander Akbashev HERE Technologies
  • 2. Here Technologies HERE Technologies, the Open Location Platform company, enables people, enterprises and cities to harness the power of location. By making sense of the world through the lens of location we empower our customers to achieve better outcomes – from helping a city manage its infrastructure or an enterprise optimize its assets to guiding drivers to their destination safely. To learn more about HERE, including our new generation of cloud- based location platform services, visit http:// 360.here.com and www.here.com
  • 3. Context • Every change goes through pre-submit validation • Feedback time is 15-40 minutes • A lot of products and platforms • 6 Jenkins masters • Up to 185k runs per day in the biggest one • 20k runs per day in average
  • 4. if something goes wrong…
  • 5. What can go wrong? Compilation is broken Tests are broken Network issues
  • 6. What can go wrong? Compilation is broken Tests are broken Network issues Jenkins master crashed EC2 plugin does not raise new nodes No connection to labs Can not cleanup workspace AWS S3 is down Git master dies Git replica is broken Compiler cache was invalidated Hit the limit of API calls to AWS Job was deleted UI is blocked Queue is too big System.exit(1) NFS stuck Deadlock in Jenkins Staging started to give feedback Restarted the wrong server
  • 7. What can go wrong? Compilation is broken Tests are broken Network issues Jenkins master crashed EC2 plugin does not raise new nodes No connection to labs Can not cleanup workspace AWS S3 is down Git master dies Git replica is broken Compiler cache was invalidated Hit the limit of API calls to AWS Job was deleted UI is blocked Queue is too big System.exit(1) NFS stuck Deadlock in Jenkins Staging started to give feedback Restarted the wrong server
  • 8.
  • 14. Monitoring Plugin (March 2016) + Easy to install
  • 15. Monitoring Plugin (March 2016) + Easy to install + Nothing to maintain
  • 16. Monitoring Plugin (March 2016) + Easy to install + Nothing to maintain - Jenkins is slow - no monitoring
  • 17. Monitoring Plugin (March 2016) + Easy to install + Nothing to maintain - Jenkins is slow - no monitoring - Monitors mainly JVM stats
  • 18. Monitoring Plugin (March 2016) + Easy to install + Nothing to maintain - Jenkins is slow - no monitoring - Monitors mainly JVM stats - Only one instance
  • 19. Monitoring Plugin (March 2016) + Easy to install + Nothing to maintain - Jenkins is slow - no monitoring - Monitors mainly JVM stats - Only one instance - Not scalable
  • 20. Monitoring Plugin (nowadays) + Easy to install + Nothing to maintain - Jenkins is slow - no monitoring - Monitors mainly JVM stats - Only one instance - Not scalable + InfluxDB/CloudWatch/Graphite
  • 21. Let’s craft own monitoring!
  • 22. Design own monitoring (March 2016) Jenkins Python InfluxDB API API
  • 23. Design own monitoring (March 2016) Jenkins Python InfluxDB import influxdb import jenkins j = Jenkins(“jenkins.host”) queue_info = j.get_queue_info() for q in queue_info: influx_server.push({“name”: q[‘job_name’], “reason”: q[‘why’]}) API API
  • 24. Design own monitoring (March 2016) Jenkins Python InfluxDB import influxdb import jenkins j = Jenkins(“jenkins.host”) queue_info = j.get_queue_info() for q in queue_info: influx_server.push({“name”: q[‘job_name’], “reason”: q[‘why’]}) API API
  • 25. Design own monitoring (March 2016) Jenkins Python InfluxDB import influxdb import jenkins j = Jenkins(“jenkins.host”) queue_info = j.get_queue_info() for q in queue_info: influx_server.push({“name”: q[‘job_name’], “reason”: q[‘why’]}) API API
  • 26. Design own monitoring (March 2016) Jenkins Python InfluxDB API API
  • 27. Design own monitoring (March 2016) Jenkins Python InfluxDB +simple API API
  • 28. Design own monitoring (March 2016) Jenkins Python InfluxDB +simple +worked for 18 months API API
  • 29. Design own monitoring (March 2016) Jenkins Python InfluxDB +simple +worked for 18 months - polling API API
  • 30. Design own monitoring (March 2016) Jenkins Python InfluxDB +simple +worked for 18 months - polling - maintain common code API API
  • 31. Design own monitoring (March 2016) Jenkins Python InfluxDB +simple +worked for 18 months - polling - maintain common code - not all data is accessible API API
  • 32. Design own monitoring (March 2016) Jenkins Python InfluxDB +simple +worked for 18 months - polling - maintain common code - not all data is accessible - extra load API API
  • 33. Design own monitoring (March 2016) Jenkins Python InfluxDB +simple +worked for 18 months - polling - maintain common code - not all data is accessible - extra load API API
  • 34. Let’s do event based monitoring!
  • 35.
  • 36. Jenkins Core public abstract class RunListener<R extends Run> implements ExtensionPoint { public void onCompleted(R r, TaskListener listener) {}
 
 public void onFinalized(R r) {}
 
 public void onStarted(R r, TaskListener listener) {} public void onDeleted(R r) {} }
  • 37. Jenkins Core public abstract class RunListener<R extends Run> implements ExtensionPoint { public void onCompleted(R r, TaskListener listener) {}
 
 public void onFinalized(R r) {}
 
 public void onStarted(R r, TaskListener listener) {} public void onDeleted(R r) {} }
  • 38. Groovy Event Listener Plugin (April 2016) • Allows to execute custom groovy code for every event • Supports RunListener
  • 39. Groovy Event Listener Plugin (nowadays) • Allows to execute custom groovy code for every event • Supports RunListener, ComputerListener, ItemListener, QueueListener • Works at scale • Allows custom classpath
  • 40. Groovy Event Listener Plugin if (event == 'RunListener.onFinalized') { def build = Thread.currentThread().executable def queueAction = build.getAction(TimeInQueueAction.class) def queuing = queueAction.getQueuingDurationMillis() log.info “number=$build.number, queue_duration=$queuing }
  • 41. Ok, we have events, but how to fill the db?
  • 43. FluentD • Process 13,000 events/second/core
  • 44. FluentD • Process 13,000 events/second/core • Retry/buffer/routing
  • 45. FluentD • Process 13,000 events/second/core • Retry/buffer/routing • Easy to extend
  • 46. FluentD • Process 13,000 events/second/core • Retry/buffer/routing • Easy to extend • Simple
  • 47. FluentD • Process 13,000 events/second/core • Retry/buffer/routing • Easy to extend • Simple • Reliable
  • 48. FluentD • Process 13,000 events/second/core • Retry/buffer/routing • Easy to extend • Simple • Reliable • Memory footprint is 30-40MB
  • 49. FluentD • Process 13,000 events/second/core • Retry/buffer/routing • Easy to extend • Simple • Reliable • Memory footprint is 30-40MB • Ruby
  • 52. FluentD Jenkins FluentD InfluxDB JSON JSON Postgres SQL Logs
  • 53. FluentD. Config. <match **.influx.**> type influxdb host influxdb.host port 8086 dbname stats auto_tags “true” timestamp_tag timestamp time_precision s </match>
  • 54. FluentD. Config. <match **.influx.**> type influxdb host influxdb.host port 8086 dbname stats auto_tags “true” timestamp_tag timestamp time_precision s </match>
  • 55. FluentD. Config. <match **.influx.**> type influxdb host influxdb.host port 8086 dbname stats auto_tags “true” timestamp_tag timestamp time_precision s </match>
  • 56. FluentD. Config. <match **.influx.**> type influxdb host influxdb.host port 8086 dbname stats auto_tags “true” timestamp_tag timestamp time_precision s </match>
  • 57. FluentD. Config. <match **.influx.**> type influxdb host influxdb.host port 8086 dbname stats auto_tags “true” timestamp_tag timestamp time_precision s </match>
  • 58. Ok, we have events, we have fluentd, but how to pass event to it?
  • 60. FluentD Plugin for Jenkins • Developed in HERE Technologies
  • 61. FluentD Plugin for Jenkins • Developed in HERE Technologies • Very simple
  • 62. FluentD Plugin for Jenkins • Developed in HERE Technologies • Very simple • Supports JSON
  • 63. FluentD Plugin for Jenkins • Developed in HERE Technologies • Very simple • Supports JSON • Post-build-step
  • 64. FluentD Plugin for Jenkins https://github.com/jenkinsci/fluentd-plugin
  • 65. Great! Let’s do something with this data!
  • 68. Build Failure Analyzer (code) def bfa = build.getAction(FailureCauseBuildAction.class) def causes = bfa.getFailureCauseDisplayData().getFoundFailureCauses() for(def cause : causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", jobName) data.put("number", build.number) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", build.timestamp.timeInMillis) data.put("node", node) context.logger.log("influx.bfa", data) }
  • 69. Build Failure Analyzer (code) def bfa = build.getAction(FailureCauseBuildAction.class) def causes = bfa.getFailureCauseDisplayData().getFoundFailureCauses() for(def cause : causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", jobName) data.put("number", build.number) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", build.timestamp.timeInMillis) data.put("node", node) context.logger.log("influx.bfa", data) }
  • 70. Build Failure Analyzer (code) def bfa = build.getAction(FailureCauseBuildAction.class) def causes = bfa.getFailureCauseDisplayData().getFoundFailureCauses() for(def cause : causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", jobName) data.put("number", build.number) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", build.timestamp.timeInMillis) data.put("node", node) context.logger.log("influx.bfa", data) }
  • 71. Build Failure Analyzer (code) def bfa = build.getAction(FailureCauseBuildAction.class) def causes = bfa.getFailureCauseDisplayData().getFoundFailureCauses() for(def cause : causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", jobName) data.put("number", build.number) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", build.timestamp.timeInMillis) data.put("node", node) context.logger.log("influx.bfa", data) }
  • 72. Build Failure Analyzer (code) def bfa = build.getAction(FailureCauseBuildAction.class) def causes = bfa.getFailureCauseDisplayData().getFoundFailureCauses() for(def cause : causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", jobName) data.put("number", build.number) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", build.timestamp.timeInMillis) data.put("node", node) context.logger.log("influx.bfa", data) }
  • 73. Build Failure Analyzer (code) def bfa = build.getAction(FailureCauseBuildAction.class) def causes = bfa.getFailureCauseDisplayData().getFoundFailureCauses() for(def cause : causes) { final Map<String, Object> data = new HashMap<>(); data.put("name", jobName) data.put("number", build.number) data.put("cause", cause.getName()) data.put("categories", cause.getCategories().join(',')) data.put("timestamp", build.timestamp.timeInMillis) data.put("node", node) context.logger.log("influx.bfa", data) }
  • 78. CCache • New node - empty local cache
  • 79. CCache • New node - empty local cache • Old local cache - a lot of misses
  • 80. CCache • New node - empty local cache • Old local cache - a lot of misses + Distributed cache solves all this problems
  • 81. CCache • New node - empty local cache • Old local cache - a lot of misses + Distributed cache solves all this problems - Once a year distributes problem across the cluster
  • 86. LoadBalancer (solution) • Default balancer is optimized for cache
  • 87. LoadBalancer (solution) • Default balancer is optimized for cache • Cron jobs are pinned to different hosts
  • 88. LoadBalancer (solution) • Default balancer is optimized for cache • Cron jobs are pinned to different hosts • Nothing to terminate/stop - no idle nodes
  • 89. LoadBalancer (solution) • Default balancer is optimized for cache • Cron jobs are pinned to different hosts • Nothing to terminate/stop - no idle nodes + Saturate Node Load Balancer: always put all load to the oldest node
  • 92. Jar Hell (problem) java.io.InvalidClassException: hudson.util.StreamTaskListener; local class incompatible: stream classdesc serialVersionUID = 1, local class serialVersionUID = 294073340889094580
  • 94. Jar Hell (explanation) • Bug in Jenkins Remoting Layer
  • 95. Jar Hell (explanation) • Bug in Jenkins Remoting Layer • If first run that is using some class is aborted - this class is “lost”
  • 96. Jar Hell (explanation) • Bug in Jenkins Remoting Layer • If first run that is using some class is aborted - this class is “lost” • Does not recover
  • 97. Jar Hell (explanation) • Bug in Jenkins Remoting Layer • If first run that is using some class is aborted - this class is “lost” • Does not recover • Huge impact
  • 98. Jar Hell (“solution”) if (cause.getName().equals("Jar Hell”)) { Node node = build.getBuiltOn() if (node != Jenkins.getInstance()) { node.setLabelString("disabled_jar_hell"); }
  • 100.
  • 102. Resources • FluentD • Influxdb plugin for fluentd • JavaGC plugin for fluentd • FluentD Plugin • Groovy Event Listener Plugin • Build Failure Analyzer Plugin • Saturate Node Load Balancer Plugin • CCache with memcache • InfluxDB