Take My Logs. Please!

7,146 views
6,159 views

Published on

Details on how we capture application data in our access and error logs, as well as how to generate quick reports and graphs from these logs.

This talk was presented at O'Reilly's Velocity Online Conference on October 26, 2011.

Published in: Technology

Take My Logs. Please!

  1. Take my logs. Please.Mike BrittainDirector of Engineering, InfrastructureEtsy.commike@etsy.com @mikebrittain
  2. (hello?)
  3. This soundsboooooorrrrring...No, no... hang in there!
  4. 25 MM uniques/month150 Countries$300 MM+ sales last year
  5. Apache, PHP, MySQL,PostgreSQL,Memcache, Gearman,Solr, etc.
  6. What’s working?
  7. What’s working?Performance
  8. What’s working?PerformanceOperability
  9. What’s working?PerformanceOperabilitySimplicity
  10. Logging + Trending
  11. App logging(Apache access and error logs)
  12. “Common”LogFormat "%h %l %u %t "%r" %>s %b
  13. “Combined”LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i""
  14. mod_log_config %f Filename requested # of keepalive requests served %k on this connection Time taken to serve the request, %T in seconds
  15. mod_log_config %f Filename requested # of keepalive requests served %k on this connection Time taken to serve the request, %D in microseconds
  16. mod_log_config %f Filename requested # of keepalive requests served %k on this connection Time taken to serve the request, %D in microseconds Contents of “note” foobar from%{foobar}n another module
  17. apache_note()apache_note(“foobar”, $whatever);
  18. “Steroids”LogFormat %{True-Client-IP}i %l %t "%r"%>s %b "%{Referer}i""%{User-Agent}i" %V%{user_id}n %{shop_id}n %{uaid}n%{ab_selections}n %{request_uid}n%{api_consumer_key}n%{api_method_name}n%{php_bytes}n %{php_microsec}n %D
  19. “Steroids”LogFormat %{True-Client-IP}i %l %t "%r"%>s %b "%{Referer}i""%{User-Agent}i" %V%{user_id}n %{shop_id}n %{uaid}n%{ab_selections}n %{request_uid}n%{api_consumer_key}n%{api_method_name}n%{php_bytes}n %{php_microsec}n %D
  20. $GLOBALS[timer] = microtime(true) * 1000000;
  21. $GLOBALS[timer] = microtime(true) * 1000000;register_shutdown_function(pageStats);function pageStats() {}
  22. $GLOBALS[timer] = microtime(true) * 1000000;register_shutdown_function(pageStats);function pageStats() { $timer_end = microtime(true) * 1000000; $diff = $timer_end - $GLOBALS[timer];}
  23. $GLOBALS[timer] = microtime(true) * 1000000;register_shutdown_function(pageStats);function pageStats() { $timer_end = microtime(true) * 1000000; $diff = $timer_end - $GLOBALS[timer]; apache_note(php_microsec, $diff); apache_note(php_bytes, memory_get_peak_usage());}
  24. What about “%D”?
  25. “Steroids”LogFormat %{True-Client-IP}i %l %t "%r"%>s %b "%{Referer}i""%{User-Agent}i" %V%{user_id}n %{shop_id}n %{uaid}n%{ab_selections}n %{request_uid}n%{api_consumer_key}n%{api_method_name}n%{php_bytes}n %{php_microsec}n %D
  26. “Steroids”LogFormat %{True-Client-IP}i %l %t "%r"%>s %b "%{Referer}i""%{User-Agent}i" %V%{user_id}n %{shop_id}n %{uaid}n%{ab_selections}n %{request_uid}n%{api_consumer_key}n%{api_method_name}n%{php_bytes}n %{php_microsec}n %D
  27. “Steroids”LogFormat %{True-Client-IP}i %l %t "%r"%>s %b "%{Referer}i""%{User-Agent}i" %V%{user_id}n %{shop_id}n %{uaid}n%{ab_selections}n %{request_uid}n%{api_consumer_key}n%{api_method_name}n%{php_bytes}n %{php_microsec}n %D
  28. “Steroids”LogFormat %{True-Client-IP}i %l %t "%r"%>s %b "%{Referer}i""%{User-Agent}i" %V%{user_id}n %{shop_id}n %{uaid}n%{ab_selections}n ...easy_reg=1; personalize_widget=0;icon_in_cornflower_blue=1;
  29. Coming soon...%{locale}n (i18n)%{platform}n (desktop vs. mobile)
  30. Coming soon...%{locale}n (i18n)%{platform}n (desktop vs. mobile)OPS-1805, OPS-1827etsy.com/careers
  31. Using something else?time, http method, request uri,response code, referer, user-agent,response time, response memory,custom segmentation fields...
  32. Quick averagesgrep "GET /listing/" access.log | awk {sum=sum+$(NF-1)} END {print sum/NR}
  33. Quick graphsgrep "GET /listing/" access.log | perl -pe "s/.*[.*d{4}:(d{2}):(d{2}):d{2}.*]/1:2/" | awk {print $1, $(NF-1)} > /tmp/pagetimes.datgives you...
  34. Quick graphs# /tmp/pagetimes.dat18:37 251.018:38 252.118:39 253.518:40 251.018:45 250.0and then...
  35. Quick graphs# GNUPLOTset terminal pngset output listings.pngset yrange [0:2000]set xdata timeset timefmt "%d/%B/%Y:%H:%M:%S"set format x "%H:%M"plot /tmp/pagetimes.dat using 1:2 with points
  36. Quick graphs
  37. Error logsPHP + Apache errors in one fileSimple logging interface
  38. Error logsLevels: error, info, debugNamespace: perf, sql, __class__
  39. Logger::error("Query exceeded 5 sec: $query", “sql_long_query”);
  40. web0054 [Fri Mar 04 16:27:48 2011] [error][sql_long_query] [mk04gw1p71] Query exceeded5 sec: SELECT * FROM ...
  41. web0054 [Fri Mar 04 16:27:48 2011] [error][sql_long_query] [mk04gw1p71] Query exceeded5 sec: SELECT * FROM ...
  42. $ grep "16:27:48" access.log | wc -l1527
  43. web0054 [Fri Mar 04 16:27:48 2011] [error][sql_long_query] [mk04gw1p71] Query exceeded5 sec: SELECT * FROM ...
  44. iowerror.log -> request_uid -> access.logrequest uri, ab selections, user id, locale,platform, api key, etc.
  45. Filteringtail -f error.log | grep -v “sql_long_query” | ...
  46. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda.web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllpppweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0201 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grweb0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling.web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling.web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0003 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grweb0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is fallingweb0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
  47. Trendingfatals errors warnings
  48. LogsterRun by cronMaintains a cursor on log filesSimple parsing & aggregationOutput to Ganglia or Graphite github.com/etsy
  49. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  50. ^.+ [.+] [(?P<log_level>.+)]
  51. if (fields[log_level] == “fatal”): self.fatals += 1elif (fields[log_level] == “error”): self.errors += 1elif (fields[log_level] == “warning”): self.warnings += 1...
  52. MetricObject("fatals", (self.fatals / self.duration), "per sec")MetricObject("errors", (self.errors / self.duration), "per sec")MetricObject("warning", (self.warnings / self.duration), "per sec")
  53. fatals errors warnings
  54. Logster Signed-in vs. Signed-out
  55. github.com/etsy
  56. Log a plethora of data.Don’t be afraid to use one file.
  57. Use custom fields to segment data.
  58. Correlate errors to specific requests.
  59. Make f#@k!ng graphs.
  60. Convert rates to trend lines.
  61. Take my logs. Please!
  62. Thank you. codeascraft.etsy.com github.com/etsyMike BrittainDirector of Engineering, InfrastructureEtsy.commike@etsy.com @mikebrittain

×