Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Java application monitoring with
Dropwizard Metrics
and Graphite
Roberto Franchini
@robfrankie
Bologna, April 10th, 2015
whoami(1)
15 years of experience, proud to be a programmer
Writes software for information extraction, nlp, opinion
mining...
Company
3
Agenda
Intro
Scenario
System monitoring
Application monitoring (dark side)
Application monitoring (light side)
Dropwizard ...
Quotes
Business value
Our code generates business value
when it runs, not when we write it.
We need to know what our code does wh...
SLA driven
Have an SLA for your service
Measure and report performance against the SLA
(Ben Treynor, google inc.)
7
Scenario
45 bare metal servers
Ngnix
Jetty (mainly embedded)
PostgreSQL
GlusterFS (28TB and growing)
Kestrel
Kafka on the horizon
R...
Software
Java shop
Home made distributed search engine
Home made little PAAS
Docker on the go
More than 120 webapps
More t...
Java
Java is not dead
And is almost everywhere
The language is evolving
The JVM is the most advanced managed environment
w...
Who uses it (cool side)
Twitter
Spotify
Google
Netflix
LinkedIn
12
Who uses it (real world)
Your bank
13
Systems monitoring
Collectd
From 2012 Collectd
systems: load, df, traffic
java (via jmx): heap
queues: items, size
dbms: connections, size
15
Collectd charts
Traffic
16
Collectd to Graphite
collectd writes to graphite
write_graphite
better charts
dashboard are easy
dashboards are meaningful...
Graphite dashboard
Servers load dashboard
18
Grafana
Grafana
A beautiful frontend for graphite
Dashboards are meaningful
and
BEAUTIFUL
(you can send screenshots to man...
Grafana dashboard
20
Application monitoring
Requirements
Measure behaviors
Send to graphite
Integrate with system measures
Correlate with system measures
22
Repeat with me
Correlate application and
system metrics
23
Correlate
graphite
collectd
applications
grafana
24
To do what?
Discover bottlenecks
post-mortem analysis
SLA monitoring
IO impact
Network traffic
Memory
25
User Story
Given the application running
when the manager comes
then I want to show a big green number
26
The answer
42
27
In detail
“Application monitoring? WHAT?”
“Ok, let me explain
What the app is doing right now?
How is the app performing r...
5 minutes later
public class PoorManJavaMetrics {
int called;
long totalTime;
public void doThings() {
final long start = ...
DIY Java Monitoring
Maybe better with centralized utility class
(maybe…)
thread safeness?
send measure to different backen...
Java Monitoring
Measure in the code
Thread safeness
Counters, gauges, meters etc.
Log metrics
Graph metrics
Export metrics...
NOT only JMX
We want more
Integrate JMX metrics from third-party libs
JMX
32
Dropwizard Metrics
https://dropwizard.github.io/metrics/3.1.0/
Overview
Code instrumentation
meters, gauges, counters, histograms
Reporters
console, csv, slf4j, jmx
Web app instrumentat...
Overview
Third party libs
aspectj
influxdb
statsd
cassandra
35
Main parts
MetricsRegistry
a collection of all the metrics for your application
usually one instance per JVM
use more in m...
Metrics
Gauges
the simplest metric type: it just returns a value
Counters
incrementing and decrementing 64.bit integer
fin...
Metrics
Histograms
measures the distribution of values in a stream of data
Meters
measures the rate at which a set of even...
Metrics
Timers
a histogram of the duration of a type of event and a
meter of the rate of its occurrence
Timer timer = regi...
Reporters
JMX
expose metrics as JMX Beans
Console
periodically reports metrics to the console
CSV
appends a set of .csv fi...
Console reporter
final ConsoleReporter console = ConsoleReporter.forRegistry(registry)
.outputTo(System.out)
.convertRates...
slf4j reporter
final Slf4jReporter logging = Slf4jReporter.forRegistry(registry)
.convertDurationsTo(TimeUnit.MINUTES)
.ou...
Graphite reporter
final Graphite graphite = new Graphite(new InetSocketAddress("graphite.example.com", 2003));
final Graph...
Metrics naming
Dot notation by getClass()
easy to create
very long name on dashboard
Maybe better to use
<namespace>.<inst...
Grafana application overview
45
Demo
References
https://dropwizard.github.io/metrics/3.1.0/
https://dl.dropboxusercontent.com/u/2744222/2011-04-09-
Metrics-Met...
Thank You
http://lanyrd.com/sdkghq
@robfrankie
franchini@celi.it
48
Upcoming SlideShare
Loading in …5
×

Java application monitoring with Dropwizard Metrics and graphite

22,409 views

Published on

Java application monitoring with Dropwizard Metrics and graphite.
How to correlate system monitoring and application monitoring using graphite as backend for Collectd and application metrics.

Published in: Technology

Java application monitoring with Dropwizard Metrics and graphite

  1. 1. Java application monitoring with Dropwizard Metrics and Graphite Roberto Franchini @robfrankie Bologna, April 10th, 2015
  2. 2. whoami(1) 15 years of experience, proud to be a programmer Writes software for information extraction, nlp, opinion mining (@scale ), and a lot of other buzzwords Implements scalable architectures Plays with servers (don't say that to my sysadmin) Member of the JUG-Torino coordination team feedback http://lanyrd.com/sdkghq 2
  3. 3. Company 3
  4. 4. Agenda Intro Scenario System monitoring Application monitoring (dark side) Application monitoring (light side) Dropwizard Metrics Dashboards 4
  5. 5. Quotes
  6. 6. Business value Our code generates business value when it runs, not when we write it. We need to know what our code does when it runs. We can’t do this unless we measure it. (Codahale) 6
  7. 7. SLA driven Have an SLA for your service Measure and report performance against the SLA (Ben Treynor, google inc.) 7
  8. 8. Scenario
  9. 9. 45 bare metal servers Ngnix Jetty (mainly embedded) PostgreSQL GlusterFS (28TB and growing) Kestrel Kafka on the horizon Redis Jenkins as scheduler (cron on steroids) Infrastructure 9
  10. 10. Software Java shop Home made distributed search engine Home made little PAAS Docker on the go More than 120 webapps More than 100 batch jobs NRT stream processing jobs running 24x7 10
  11. 11. Java Java is not dead And is almost everywhere The language is evolving The JVM is the most advanced managed environment where run your code Choose your style: Scala, Clojure, Groovy 11
  12. 12. Who uses it (cool side) Twitter Spotify Google Netflix LinkedIn 12
  13. 13. Who uses it (real world) Your bank 13
  14. 14. Systems monitoring
  15. 15. Collectd From 2012 Collectd systems: load, df, traffic java (via jmx): heap queues: items, size dbms: connections, size 15
  16. 16. Collectd charts Traffic 16
  17. 17. Collectd to Graphite collectd writes to graphite write_graphite better charts dashboard are easy dashboards are meaningful 17
  18. 18. Graphite dashboard Servers load dashboard 18
  19. 19. Grafana Grafana A beautiful frontend for graphite Dashboards are meaningful and BEAUTIFUL (you can send screenshots to managers now) 19
  20. 20. Grafana dashboard 20
  21. 21. Application monitoring
  22. 22. Requirements Measure behaviors Send to graphite Integrate with system measures Correlate with system measures 22
  23. 23. Repeat with me Correlate application and system metrics 23
  24. 24. Correlate graphite collectd applications grafana 24
  25. 25. To do what? Discover bottlenecks post-mortem analysis SLA monitoring IO impact Network traffic Memory 25
  26. 26. User Story Given the application running when the manager comes then I want to show a big green number 26
  27. 27. The answer 42 27
  28. 28. In detail “Application monitoring? WHAT?” “Ok, let me explain What the app is doing right now? How is the app performing right now? And then graph it!” “Ok, I got it!” “Let me see” 28
  29. 29. 5 minutes later public class PoorManJavaMetrics { int called; long totalTime; public void doThings() { final long start = System.currentTimeMillis(); //heavy business logic called++; final long end = System.currentTimeMillis(); final long duration = end - start; totalTime +=duration; } public void logStats() { System.out.println("---stats---"); //I can’t write that } } 29
  30. 30. DIY Java Monitoring Maybe better with centralized utility class (maybe…) thread safeness? send measure to different backends? log to different logging systems? 30
  31. 31. Java Monitoring Measure in the code Thread safeness Counters, gauges, meters etc. Log metrics Graph metrics Export metrics 31
  32. 32. NOT only JMX We want more Integrate JMX metrics from third-party libs JMX 32
  33. 33. Dropwizard Metrics https://dropwizard.github.io/metrics/3.1.0/
  34. 34. Overview Code instrumentation meters, gauges, counters, histograms Reporters console, csv, slf4j, jmx Web app instrumentation Web app health check Advanced reporters graphite, ganglia 34
  35. 35. Overview Third party libs aspectj influxdb statsd cassandra 35
  36. 36. Main parts MetricsRegistry a collection of all the metrics for your application usually one instance per JVM use more in multi WAR deployment Names each metric has a unique name registry has helper methods for creating names MetricRegistry.name(Queue.class, "items", "total") //com.example.queue.items.total MetricRegistry.name(Queue.class, "size", "byte") //com.example.queue.size.byte 36
  37. 37. Metrics Gauges the simplest metric type: it just returns a value Counters incrementing and decrementing 64.bit integer final Map<String, String> keys = new HashMap<>(); registry.register(MetricRegistry.name("gauge", "keys"), new Gauge<Integer>() { @Override public Integer getValue() { return keys.keySet().size(); } }); final Counter counter= registry.counter(MetricRegistry.name("counter", "inserted")); counter.inc(); 37
  38. 38. Metrics Histograms measures the distribution of values in a stream of data Meters measures the rate at which a set of events occur final Histogram resultCounts = registry.histogram(name(ProductDAO.class, "result-counts"); resultCounts.update(results.size()); final Meter meter = registry.meter(MetricRegistry.name("meter", "inserted")); meter.mark(); 38
  39. 39. Metrics Timers a histogram of the duration of a type of event and a meter of the rate of its occurrence Timer timer = registry.timer(MetricRegistry.name("timer", "inserted")); Context context = timer.time(); //timed ops context.stop(); 39
  40. 40. Reporters JMX expose metrics as JMX Beans Console periodically reports metrics to the console CSV appends a set of .csv files in a given dir SLF4j log metrics to a logger Graphite stream metrics to graphite 40
  41. 41. Console reporter final ConsoleReporter console = ConsoleReporter.forRegistry(registry) .outputTo(System.out) .convertRatesTo(TimeUnit.MINUTES) .build(); console.start(10, TimeUnit.SECONDS); 4/9/15 11:45:57 PM ============================================================= -- Gauges ---------------------------------------------------------------------- gauge.keys value = 9901 -- Counters -------------------------------------------------------------------- counter.inserted count = 9901 -- Meters ---------------------------------------------------------------------- meter.inserted count = 9901 41
  42. 42. slf4j reporter final Slf4jReporter logging = Slf4jReporter.forRegistry(registry) .convertDurationsTo(TimeUnit.MINUTES) .outputTo(LoggerFactory.getILoggerFactory().getLogger("metrics")) . build(); logging.start(20, TimeUnit.SECONDS); 0 [metrics-logger-reporter-2-thread-1] INFO metrics - type=GAUGE, name=gauge.keys, value=901 2 [metrics-logger-reporter-2-thread-1] INFO metrics - type=COUNTER, name=counter.inserted, count=901 6 [metrics-logger-reporter-2-thread-1] INFO metrics - type=METER, name=meter.inserted, count=901, mean_rate=90.03794743129822, m1=81.7831205903394, m5=80.52726521433198, m15=80. 30969500950305, rate_unit=events/second 14 [metrics-logger-reporter-2-thread-1] INFO metrics - type=TIMER, name=timer.inserted, count=900, min=1. 9083333333333335E-8, max=0.016671673633333335, mean=1.667999479718904E-4, stddev=0. 0016585493668388946, median=7.196666666666667E-8, p75=1.3421666666666667E-7, p95=2. 7838333333333335E-7, p98=7.131833333333334E-7, p99=0.01666843721666667, p999=0. 016671673633333335, mean_rate=89.8720293570475, m1=81.59911170741354, m5=80.33057092356765, m15=80.11080303990207, rate_unit=events/second, duration_unit=minutes 42
  43. 43. Graphite reporter final Graphite graphite = new Graphite(new InetSocketAddress("graphite.example.com", 2003)); final GraphiteReporter reporter = GraphiteReporter.forRegistry(registry) .prefixedWith("web1.example.com") .convertRatesTo(TimeUnit.SECONDS) .convertDurationsTo(TimeUnit.MILLISECONDS) .filter(MetricFilter.ALL) .build(graphite); reporter.start(1, TimeUnit.MINUTES); Metrics can be prefixed Useful to divide environment metrics: prod, test 43
  44. 44. Metrics naming Dot notation by getClass() easy to create very long name on dashboard Maybe better to use <namespace>.<instrumented section> .<target (noun)>.<action (past tense verb)> Such as accounts.authentication.password.failed Use prefix prod, test, dev, local differentiate data retention on graphite by prefix 44
  45. 45. Grafana application overview 45
  46. 46. Demo
  47. 47. References https://dropwizard.github.io/metrics/3.1.0/ https://dl.dropboxusercontent.com/u/2744222/2011-04-09- Metrics-Metrics-Everywhere.pdf http://graphite.wikidot.com/ http://grafana.org/ http://matt.aimonetti.net/posts/2013/06/26/practical-guide- to-graphite-monitoring/ https://www.usenix. org/sites/default/files/conference/protected- files/srecon15_slides_limoncelli.pdf 47
  48. 48. Thank You http://lanyrd.com/sdkghq @robfrankie franchini@celi.it 48

×