Open Source Logging

and Metrics Tools
CapitalCamp and Gov Days 2014
Introduction
Director of Engineering, Phase2
Steven Merrill
Twitter: @stevenmerrill
About This Talk
• Let you visualize your data with OSS tools
• Information on customizing logs from common daemons
• Stron...
Demo:

ELK Stack in Action
Demo Setup
• 2 Google Cloud Engine g1.small instances
• All instances run collectd to grab system metrics
• 1 'drupal' ins...
Logs
Ceci n'est pas une log
Logs are time + data.
Creator of Logstash
Jordan Sissel
“ ”
What Are Logs
• Ultimately, logs are about keeping track of events
• Logs are very different; some use custom formats, whil...
Who Produces Logs
• Drupal
• nginx
• Apache
• Varnish
• Jenkins
• SOLR
• MySQL
• cron
• sudo
• ...
Types of Logs
• Error Logs
• Transaction Logs
• Trace Logs
• Debug Logs
Issues With Logs
• Legal retention requirements
• Require shell access to view
• Not often human-parseable
• Cyborg-friend...
Solving Problems With Log Data
• Find slow pages or queries
• Sort through Drupal logs to trace user action on a site
• Ge...
Shipping Logs
Ship Those Logs!
• syslog-ng
• rsyslogd
• Ship syslog
• Ship other log files
• Lumberjack (logstash-
forwarder)
• Beaver
Shipping Concerns
• Queueing
• Behavior when shipping
to remote servers
• Max spool disk usage
• Retries?
• Security
• Enc...
Configuring rsyslogd Clients
• Ship logs to another rsyslog server over TCP
• *.* @@utility:514
• This defaults to shippin...
Configuring rsyslogd Servers
• Prevent remote logs from showing up in /var/log/messages
• if $source != 'utility' then ~
•...
Configuring rsyslogd Shipping
• Read lines from a particular file and ship over syslog
• $ModLoad imfile

$InputFileName /...
Configuring rsyslogd Spooling
• Configure spooling and queueing behavior
• $WorkDirectory /var/lib/rsyslog # where to plac...
Syslog-shipped Log Files
Mar 11 15:38:14 drupal drupal: http://192.168.32.3|1394566694|
system|192.168.32.1|http://192.168...
Log Formats
Syslog
Apr 11 18:35:53 shiftiest dnsmasq-dhcp[23185]: DHCPACK(br100)
192.168.32.4 fa:16:3e:c4:2f:fd varnish4
Mar 11 15:38:...
Apache
127.0.0.1 - - [08/Mar/2014:00:36:44 -0500] "GET /dashboard
HTTP/1.0" 302 20 "https://68.232.187.42/dashboard/" "Moz...
nginx
192.168.32.1 - - [11/Apr/2014:10:44:36 -0400] "GET /kibana/
font/fontawesome-webfont.woff?v=3.2.1 HTTP/1.1" 200 4357...
Varnish
192.168.32.1 - - [11/Apr/2014:10:47:52 -0400] "GET http://
192.168.32.3/themes/seven/images/list-item.png HTTP/1.1...
Additional Features
• Apache, nginx, and Varnish all support additional output
• Varnish can log cache hit/miss
• With Log...
Apache
• Configurable log formats are available – http://
httpd.apache.org/docs/2.2/mod/mod_log_config.html
• A single Log...
Apache
• Additional useful information:
• %D Time taken to serve request in microseconds
• %{Host}i Value of the Host HTTP...
nginx
• Log formats are defined with the log_format directive – http://
nginx.org/en/docs/http/ngx_http_log_module.html#lo...
Apache
127.0.0.1 - - [29/Jul/2014:22:03:07 +0000] "GET /admin/config/
development/performance HTTP/1.0" 200 3500 "-" "Mozi...
nginx
• Additional useful information:
• $request_time Time taken to serve request in seconds with
millisecond resolution ...
nginx
• New log_format line and example config for a vhost:
• log_format logstash '$remote_addr - $remote_user [$time_loca...
nginx
70.42.157.6 - - [22/Jul/2014:22:03:30 +0000] "POST /
logstash-2014.07.22/_search HTTP/1.0" 200 281190 "http://
146.1...
Varnish
• The varnishncsa daemon outputs NCSA-format logs
• You may pass a different log format to the varnishncsa
daemon; ...
Varnish
• Additional useful information:
• %D Time taken to serve request in seconds with

microsecond precision (e.g. 0.0...
Varnish
70.42.157.6 - - [29/Jul/2014:22:03:07 +0000] "GET http://
23.251.149.143/admin/config/development/performance HTTP...
Automated Tools
Proprietary Tools
• Third-party SaaS systems are plentiful in this area
• Splunk
• SumoLogic
• Loggly
• LogEntries
Logstash
• http://logstash.net/
• Great tool to work with logs of ALL sorts
• Has input, filter, and output pipelines
• In...
ElasticSearch
• http://www.elasticsearch.com/
• A Java search engine based on Lucene, similar to SOLR
• Offers a nicer REST...
Kibana
• Great viewer for Logstash logs
• Needs direct HTTP access to ElasticSearch
• You may need to protect this with ng...
Grok
• Tool for pulling semantic data from logs; logstash filter
• A regex engine with built-in named patterns
• Online to...
Example:

Grokking nginx Logs
192.168.32.1 - - [11/Apr/2014:10:44:36 -0400] "GET /kibana/
font/fontawesome-webfont.woff?v=...
Configuring Logstash
Logstash Config
• By default Logstash looks in /etc/logstash/conf.d/*.conf
• You many include multiple files
• Each must h...
Logstash Config
input {
file {
path => "/var/log/rsyslog/*/*.log"
exclude => "*.bz2"
type => syslog
sincedb_path => "/var/...
Logstash Config
filter {
if [type] == "syslog" {
mutate {
add_field => [ "syslog_message", "%{message}" ]
remove_field => ...
Logstash Config
date {
match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
if [program] == "drupal" {
grok {
m...
Logstash Config
if [program] == "nginx_access" {
ruby {
code => "event['duration'] = event['duration'].to_f * 1000.0"
}
}
...
Monitoring and
Performance Metrics
Logs vs Performance Counters
• Generally, logs capture data at a particular time
• You may also want to keep information a...
Proprietary Tools
• Third-party SaaS systems are also plentiful in this area
• DataDog
• Librato Metrics
• Circonus
• New ...
Time-Series Data
• Generally, performance counters are taken with regular
sampling at an interval, known as time-series da...
First Wave: RRD-based Tools
• Many tools can graph metrics and make and plot RRD files
• Munin
• Cacti
• Ganglia
• collectd
Second Wave: Graphite
• Graphite is a more general tool; it does not collect metrics
• It uses an advanced storage engine ...
Grafana
• Grafana is to Graphite as Kibana is to ElasticSearch
• HTML / JavaScript app
• Needs direct HTTP access to Graph...
Collectd
• http://collectd.org/
• Collectd is a tool that makes it easy to capture many
system-level statistics
• It can w...
Demo: Graphite /
collectd / Grafana
The Drupal.org

Logging Setup
Single Log Host Machine
• CentOS 5
• Dual quad-core Gulftown Xeons (8 cores, 16 threads)
• 16 GB RAM
• 600 GB of HDD stora...
Software
• ElasticSearch 0.90
• Logstash 1.2
• Kibana 3.0.0m3
• Curator 0.6.2
Stats
• Consolidating logs from ≈ 10 web servers
• Incoming syslog (Drupal), Apache, nginx, and Varnish logs
• Non-syslog ...
Questions?
Resources
Links
• http://logstash.net/
• http://elasticsearch.com/
• https://github.com/elasticsearch/kibana/
• http://graphite.wiki...
Links
• https://collectd.org/
• https://www.drupal.org/documentation/modules/syslog
• https://github.com/elasticsearch/log...
PHASE2TECHNOLOGY.COM
Upcoming SlideShare
Loading in …5
×

Open Source Logging and Metric Tools

2,580 views

Published on

CapitalCamp and GovDays 2014

Published in: Technology

Open Source Logging and Metric Tools

  1. 1. Open Source Logging
 and Metrics Tools CapitalCamp and Gov Days 2014
  2. 2. Introduction
  3. 3. Director of Engineering, Phase2 Steven Merrill Twitter: @stevenmerrill
  4. 4. About This Talk • Let you visualize your data with OSS tools • Information on customizing logs from common daemons • Strong focus on log aggregation, parsing, and search • Information about drupal.org's logging setup • Some information on performance metrics tools • Two-machine demo of Drupal and logging tools
  5. 5. Demo:
 ELK Stack in Action
  6. 6. Demo Setup • 2 Google Cloud Engine g1.small instances • All instances run collectd to grab system metrics • 1 'drupal' instance with Apache, Varnish, MySQL, PHP • 1 'utility' instance with rsyslog host, Jenkins, Graphite, Grafana, ElasticSearch, Logstash, Kibana, bucky
  7. 7. Logs
  8. 8. Ceci n'est pas une log
  9. 9. Logs are time + data. Creator of Logstash Jordan Sissel “ ”
  10. 10. What Are Logs • Ultimately, logs are about keeping track of events • Logs are very different; some use custom formats, while some may be in pure XML or JSON • Some are one line, some are many, like Java stacktraces or MySQL slow query logs
  11. 11. Who Produces Logs • Drupal • nginx • Apache • Varnish • Jenkins • SOLR • MySQL • cron • sudo • ...
  12. 12. Types of Logs • Error Logs • Transaction Logs • Trace Logs • Debug Logs
  13. 13. Issues With Logs • Legal retention requirements • Require shell access to view • Not often human-parseable • Cyborg-friendly tooling
  14. 14. Solving Problems With Log Data • Find slow pages or queries • Sort through Drupal logs to trace user action on a site • Get an average idea of traffic to a particular area • Track new PHP error types
  15. 15. Shipping Logs
  16. 16. Ship Those Logs! • syslog-ng • rsyslogd • Ship syslog • Ship other log files • Lumberjack (logstash- forwarder) • Beaver
  17. 17. Shipping Concerns • Queueing • Behavior when shipping to remote servers • Max spool disk usage • Retries? • Security • Encrypted channel • Encrypted at rest • Access to sensitive data
  18. 18. Configuring rsyslogd Clients • Ship logs to another rsyslog server over TCP • *.* @@utility:514 • This defaults to shipping anything that it would normally log to /var/log/syslog or /var/log/messages
  19. 19. Configuring rsyslogd Servers • Prevent remote logs from showing up in /var/log/messages • if $source != 'utility' then ~ • Store logs coming in based on hostname and date • $template DailyPerHostLogs,"/var/log/rsyslog/%HOSTNAME%/ %HOSTNAME%.%$YEAR%-%$MONTH%-%$DAY%.log"
 *.* -?DailyPerHostLogs;RSYSLOG_TraditionalFileFormat
  20. 20. Configuring rsyslogd Shipping • Read lines from a particular file and ship over syslog • $ModLoad imfile
 $InputFileName /var/log/httpd/access_log
 $InputFileTag apache_access:
 $InputFileStateFile state-apache_access
 $InputFileSeverity info
 $InputFileFacility local0
 $InputFilePollInterval 10
 $InputRunFileMonitor
  21. 21. Configuring rsyslogd Spooling • Configure spooling and queueing behavior • $WorkDirectory /var/lib/rsyslog # where to place spool files
 $ActionQueueFileName fwdRule1 # unique name prefix for spool files
 $ActionQueueMaxDiskSpace 1g # 1gb space limit
 $ActionQueueSaveOnShutdown on # save messages to disk on shutdown
 $ActionQueueType LinkedList # run asynchronously
 $ActionResumeRetryCount -1 # infinite retries if host is down
  22. 22. Syslog-shipped Log Files Mar 11 15:38:14 drupal drupal: http://192.168.32.3|1394566694| system|192.168.32.1|http://192.168.32.3/admin/modules/list /confirm|http://192.168.32.3/admin/modules|1||php module installed. ! Jul 30 15:04:14 drupal varnish_access: 156.40.118.178 - - [30/ Jul/2014:15:04:09 +0000] "GET http://23.251.149.143/misc/ tableheader.js?n9j5uu HTTP/1.1" 200 1848 "http:// 23.251.149.143/admin/modules" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" 0.000757 miss
  23. 23. Log Formats
  24. 24. Syslog Apr 11 18:35:53 shiftiest dnsmasq-dhcp[23185]: DHCPACK(br100) 192.168.32.4 fa:16:3e:c4:2f:fd varnish4 Mar 11 15:38:14 drupal drupal: http://192.168.32.3|1394566694| system|192.168.32.1|http://192.168.32.3/admin/modules/list /confirm|http://192.168.32.3/admin/modules|1||php module installed.
  25. 25. Apache 127.0.0.1 - - [08/Mar/2014:00:36:44 -0500] "GET /dashboard HTTP/1.0" 302 20 "https://68.232.187.42/dashboard/" "Mozilla/ 5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36"
  26. 26. nginx 192.168.32.1 - - [11/Apr/2014:10:44:36 -0400] "GET /kibana/ font/fontawesome-webfont.woff?v=3.2.1 HTTP/1.1" 200 43572 "http://192.168.32.6/kibana/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.116 Safari/537.36"
  27. 27. Varnish 192.168.32.1 - - [11/Apr/2014:10:47:52 -0400] "GET http:// 192.168.32.3/themes/seven/images/list-item.png HTTP/1.1" 200 195 "http://192.168.32.3/admin/config" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.116 Safari/537.36"
  28. 28. Additional Features • Apache, nginx, and Varnish all support additional output • Varnish can log cache hit/miss • With Logstash we can look at how to normalize these • A regex engine with built-in named patterns • Online tools to parse sample logs
  29. 29. Apache • Configurable log formats are available – http:// httpd.apache.org/docs/2.2/mod/mod_log_config.html • A single LogFormat directive in any Apache configuration file will override all log formats • The default NCSA combined log format is as follows • LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i"
 "%{User-agent}i"" combined
  30. 30. Apache • Additional useful information: • %D Time taken to serve request in microseconds • %{Host}i Value of the Host HTTP header • %p Port • New LogFormat line: • LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i"
 "%{User-Agent}i" %D %{Host}i %p" combined
  31. 31. nginx • Log formats are defined with the log_format directive – http:// nginx.org/en/docs/http/ngx_http_log_module.html#log_format • You may not override the default NCSA combined format • log_format combined '$remote_addr - $remote_user [$time_local] '
 '"$request" $status $body_bytes_sent '
 '"$http_referer" "$http_user_agent"';
  32. 32. Apache 127.0.0.1 - - [29/Jul/2014:22:03:07 +0000] "GET /admin/config/ development/performance HTTP/1.0" 200 3500 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" ! 127.0.0.1 - - [29/Jul/2014:22:03:07 +0000] "GET /admin/config/ development/performance HTTP/1.0" 200 3500 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" 45304 23.251.149.143 80
  33. 33. nginx • Additional useful information: • $request_time Time taken to serve request in seconds with millisecond resolution (e.g. 0.073) • $http_host Value of the Host HTTP header • $server_post Port
  34. 34. nginx • New log_format line and example config for a vhost: • log_format logstash '$remote_addr - $remote_user [$time_local] '
 '"$request" $status $body_bytes_sent '
 '"$http_referer" "$http_user_agent" '
 '$request_time $http_host $server_port'; • access_log /var/log/nginx/access.log logstash;
  35. 35. nginx 70.42.157.6 - - [22/Jul/2014:22:03:30 +0000] "POST / logstash-2014.07.22/_search HTTP/1.0" 200 281190 "http:// 146.148.34.62/kibana/index.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" ! 70.42.157.6 - - [22/Jul/2014:22:03:30 +0000] "POST / logstash-2014.07.22/_search HTTP/1.0" 200 281190 "http:// 146.148.34.62/kibana/index.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" 0.523 146.148.34.62 80
  36. 36. Varnish • The varnishncsa daemon outputs NCSA-format logs • You may pass a different log format to the varnishncsa daemon; many share the same format as Apache
  37. 37. Varnish • Additional useful information: • %D Time taken to serve request in seconds with
 microsecond precision (e.g. 0.000884) • %{Varnish:hitmiss}x The text "hit" or "miss" • varnishncsa daemon argument: • -F '%h %l %u %t "%r" %s %b "%{Referer}i" "%{User-agent}i"
 %D %{Varnish:hitmiss}x'
  38. 38. Varnish 70.42.157.6 - - [29/Jul/2014:22:03:07 +0000] "GET http:// 23.251.149.143/admin/config/development/performance HTTP/1.0" 200 3500 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" ! 70.42.157.6 - - [29/Jul/2014:22:03:07 +0000] "GET http:// 23.251.149.143/admin/config/development/performance HTTP/1.0" 200 3500 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" 0.045969 miss
  39. 39. Automated Tools
  40. 40. Proprietary Tools • Third-party SaaS systems are plentiful in this area • Splunk • SumoLogic • Loggly • LogEntries
  41. 41. Logstash • http://logstash.net/ • Great tool to work with logs of ALL sorts • Has input, filter, and output pipelines • Inputs can be parsed with different codecs (JSON, netflow) • http://logstash.net/docs/1.4.2/ describes many options
  42. 42. ElasticSearch • http://www.elasticsearch.com/ • A Java search engine based on Lucene, similar to SOLR • Offers a nicer REST API; easy discovery for clustering
  43. 43. Kibana • Great viewer for Logstash logs • Needs direct HTTP access to ElasticSearch • You may need to protect this with nginx or the like • Uses ElasticSearch features to show statistical information • Can show any ElasticSearch data, not just Logstash
  44. 44. Grok • Tool for pulling semantic data from logs; logstash filter • A regex engine with built-in named patterns • Online tools to parse sample logs • http://grokdebug.herokuapp.com/ • http://grokconstructor.appspot.com/
  45. 45. Example:
 Grokking nginx Logs 192.168.32.1 - - [11/Apr/2014:10:44:36 -0400] "GET /kibana/ font/fontawesome-webfont.woff?v=3.2.1 HTTP/1.1" 200 43572 "http://192.168.32.6/kibana/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko)
  46. 46. Configuring Logstash
  47. 47. Logstash Config • By default Logstash looks in /etc/logstash/conf.d/*.conf • You many include multiple files • Each must have at least an input, filter, or output stanza
  48. 48. Logstash Config input { file { path => "/var/log/rsyslog/*/*.log" exclude => "*.bz2" type => syslog sincedb_path => "/var/run/logstash/sincedb" sincedb_write_interval => 10 } }
  49. 49. Logstash Config filter { if [type] == "syslog" { mutate { add_field => [ "syslog_message", "%{message}" ] remove_field => "message" } grok { match => [ "syslog_message", "%{SYSLOGLINE}" ] } date { match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ] } # Parse Drupal logs that are logged to syslog.
  50. 50. Logstash Config date { match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ] } if [program] == "drupal" { grok { match => [ "message", "https?://%{HOSTNAME:vhost}?|% {NUMBER:d_timestamp}|(?<d_type>[^|]*)|%{IP:d_ip}|(?<d_request_uri>[^ |]*)|(?<d_referer>[^|]*)|(?<d_uid>[^|]*)|(?<d_link>[^|]*)|(? <d_message>.*)" ] } }
  51. 51. Logstash Config if [program] == "nginx_access" { ruby { code => "event['duration'] = event['duration'].to_f * 1000.0" } } if [program] == "varnish_access" { ruby { code => "event['duration'] = event['duration'].to_f * 1000.0" } } } }
  52. 52. Monitoring and Performance Metrics
  53. 53. Logs vs Performance Counters • Generally, logs capture data at a particular time • You may also want to keep information about how your servers are running and performing • A separate set of tools are often used to help monitoring and manage systems performance • This data can then be trended to chart resource usage and capacity
  54. 54. Proprietary Tools • Third-party SaaS systems are also plentiful in this area • DataDog • Librato Metrics • Circonus • New Relic / AppNeta
  55. 55. Time-Series Data • Generally, performance counters are taken with regular sampling at an interval, known as time-series data • Several OSS tools exist to store and query time-series data: • RRDTool • Whisper • InfluxDB
  56. 56. First Wave: RRD-based Tools • Many tools can graph metrics and make and plot RRD files • Munin • Cacti • Ganglia • collectd
  57. 57. Second Wave: Graphite • Graphite is a more general tool; it does not collect metrics • It uses an advanced storage engine called Whisper • It can buffer data and cache it under heavy load • It does not require data to be inserted all the time • It's fully designed to take time-series data and graph it
  58. 58. Grafana • Grafana is to Graphite as Kibana is to ElasticSearch • HTML / JavaScript app • Needs direct HTTP access to Graphite • You may need to protect this with nginx or the like
  59. 59. Collectd • http://collectd.org/ • Collectd is a tool that makes it easy to capture many system-level statistics • It can write to RRD databases or to Graphite • Collectd is written in C and is efficient; it can remain resident in memory and report on a regular interval
  60. 60. Demo: Graphite / collectd / Grafana
  61. 61. The Drupal.org
 Logging Setup
  62. 62. Single Log Host Machine • CentOS 5 • Dual quad-core Gulftown Xeons (8 cores, 16 threads) • 16 GB RAM • 600 GB of HDD storage dedicated to Logstash
  63. 63. Software • ElasticSearch 0.90 • Logstash 1.2 • Kibana 3.0.0m3 • Curator 0.6.2
  64. 64. Stats • Consolidating logs from ≈ 10 web servers • Incoming syslog (Drupal), Apache, nginx, and Varnish logs • Non-syslog logs are updated every hour with rsync • > 2 billion logs processed per month • Indexing is spiky but not constant; load average of 0.5
  65. 65. Questions?
  66. 66. Resources
  67. 67. Links • http://logstash.net/ • http://elasticsearch.com/ • https://github.com/elasticsearch/kibana/ • http://graphite.wikidot.com/ • http://grafana.org/
  68. 68. Links • https://collectd.org/ • https://www.drupal.org/documentation/modules/syslog • https://github.com/elasticsearch/logstash-forwarder
  69. 69. PHASE2TECHNOLOGY.COM

×