Take My Logs. Please!

  • 3,710 views
Uploaded on

Details on how we capture application data in our access and error logs, as well as how to generate quick reports and graphs from these logs. …

Details on how we capture application data in our access and error logs, as well as how to generate quick reports and graphs from these logs.

This talk was presented at O'Reilly's Velocity Online Conference on October 26, 2011.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,710
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
84
Comments
0
Likes
20

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Take my logs. Please.Mike BrittainDirector of Engineering, InfrastructureEtsy.commike@etsy.com @mikebrittain
  • 2. (hello?)
  • 3. This soundsboooooorrrrring...No, no... hang in there!
  • 4. 25 MM uniques/month150 Countries$300 MM+ sales last year
  • 5. Apache, PHP, MySQL,PostgreSQL,Memcache, Gearman,Solr, etc.
  • 6. What’s working?
  • 7. What’s working?Performance
  • 8. What’s working?PerformanceOperability
  • 9. What’s working?PerformanceOperabilitySimplicity
  • 10. Logging + Trending
  • 11. App logging(Apache access and error logs)
  • 12. “Common”LogFormat "%h %l %u %t "%r" %>s %b
  • 13. “Combined”LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i""
  • 14. mod_log_config %f Filename requested # of keepalive requests served %k on this connection Time taken to serve the request, %T in seconds
  • 15. mod_log_config %f Filename requested # of keepalive requests served %k on this connection Time taken to serve the request, %D in microseconds
  • 16. mod_log_config %f Filename requested # of keepalive requests served %k on this connection Time taken to serve the request, %D in microseconds Contents of “note” foobar from%{foobar}n another module
  • 17. apache_note()apache_note(“foobar”, $whatever);
  • 18. “Steroids”LogFormat %{True-Client-IP}i %l %t "%r"%>s %b "%{Referer}i""%{User-Agent}i" %V%{user_id}n %{shop_id}n %{uaid}n%{ab_selections}n %{request_uid}n%{api_consumer_key}n%{api_method_name}n%{php_bytes}n %{php_microsec}n %D
  • 19. “Steroids”LogFormat %{True-Client-IP}i %l %t "%r"%>s %b "%{Referer}i""%{User-Agent}i" %V%{user_id}n %{shop_id}n %{uaid}n%{ab_selections}n %{request_uid}n%{api_consumer_key}n%{api_method_name}n%{php_bytes}n %{php_microsec}n %D
  • 20. $GLOBALS[timer] = microtime(true) * 1000000;
  • 21. $GLOBALS[timer] = microtime(true) * 1000000;register_shutdown_function(pageStats);function pageStats() {}
  • 22. $GLOBALS[timer] = microtime(true) * 1000000;register_shutdown_function(pageStats);function pageStats() { $timer_end = microtime(true) * 1000000; $diff = $timer_end - $GLOBALS[timer];}
  • 23. $GLOBALS[timer] = microtime(true) * 1000000;register_shutdown_function(pageStats);function pageStats() { $timer_end = microtime(true) * 1000000; $diff = $timer_end - $GLOBALS[timer]; apache_note(php_microsec, $diff); apache_note(php_bytes, memory_get_peak_usage());}
  • 24. What about “%D”?
  • 25. “Steroids”LogFormat %{True-Client-IP}i %l %t "%r"%>s %b "%{Referer}i""%{User-Agent}i" %V%{user_id}n %{shop_id}n %{uaid}n%{ab_selections}n %{request_uid}n%{api_consumer_key}n%{api_method_name}n%{php_bytes}n %{php_microsec}n %D
  • 26. “Steroids”LogFormat %{True-Client-IP}i %l %t "%r"%>s %b "%{Referer}i""%{User-Agent}i" %V%{user_id}n %{shop_id}n %{uaid}n%{ab_selections}n %{request_uid}n%{api_consumer_key}n%{api_method_name}n%{php_bytes}n %{php_microsec}n %D
  • 27. “Steroids”LogFormat %{True-Client-IP}i %l %t "%r"%>s %b "%{Referer}i""%{User-Agent}i" %V%{user_id}n %{shop_id}n %{uaid}n%{ab_selections}n %{request_uid}n%{api_consumer_key}n%{api_method_name}n%{php_bytes}n %{php_microsec}n %D
  • 28. “Steroids”LogFormat %{True-Client-IP}i %l %t "%r"%>s %b "%{Referer}i""%{User-Agent}i" %V%{user_id}n %{shop_id}n %{uaid}n%{ab_selections}n ...easy_reg=1; personalize_widget=0;icon_in_cornflower_blue=1;
  • 29. Coming soon...%{locale}n (i18n)%{platform}n (desktop vs. mobile)
  • 30. Coming soon...%{locale}n (i18n)%{platform}n (desktop vs. mobile)OPS-1805, OPS-1827etsy.com/careers
  • 31. Using something else?time, http method, request uri,response code, referer, user-agent,response time, response memory,custom segmentation fields...
  • 32. Quick averagesgrep "GET /listing/" access.log | awk {sum=sum+$(NF-1)} END {print sum/NR}
  • 33. Quick graphsgrep "GET /listing/" access.log | perl -pe "s/.*[.*d{4}:(d{2}):(d{2}):d{2}.*]/1:2/" | awk {print $1, $(NF-1)} > /tmp/pagetimes.datgives you...
  • 34. Quick graphs# /tmp/pagetimes.dat18:37 251.018:38 252.118:39 253.518:40 251.018:45 250.0and then...
  • 35. Quick graphs# GNUPLOTset terminal pngset output listings.pngset yrange [0:2000]set xdata timeset timefmt "%d/%B/%Y:%H:%M:%S"set format x "%H:%M"plot /tmp/pagetimes.dat using 1:2 with points
  • 36. Quick graphs
  • 37. Error logsPHP + Apache errors in one fileSimple logging interface
  • 38. Error logsLevels: error, info, debugNamespace: perf, sql, __class__
  • 39. Logger::error("Query exceeded 5 sec: $query", “sql_long_query”);
  • 40. web0054 [Fri Mar 04 16:27:48 2011] [error][sql_long_query] [mk04gw1p71] Query exceeded5 sec: SELECT * FROM ...
  • 41. web0054 [Fri Mar 04 16:27:48 2011] [error][sql_long_query] [mk04gw1p71] Query exceeded5 sec: SELECT * FROM ...
  • 42. $ grep "16:27:48" access.log | wc -l1527
  • 43. web0054 [Fri Mar 04 16:27:48 2011] [error][sql_long_query] [mk04gw1p71] Query exceeded5 sec: SELECT * FROM ...
  • 44. iowerror.log -> request_uid -> access.logrequest uri, ab selections, user id, locale,platform, api key, etc.
  • 45. Filteringtail -f error.log | grep -v “sql_long_query” | ...
  • 46. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda.web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllpppweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0201 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grweb0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling.web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling.web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0003 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grweb0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is fallingweb0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
  • 47. Trendingfatals errors warnings
  • 48. LogsterRun by cronMaintains a cursor on log filesSimple parsing & aggregationOutput to Ganglia or Graphite github.com/etsy
  • 49. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 50. ^.+ [.+] [(?P<log_level>.+)]
  • 51. if (fields[log_level] == “fatal”): self.fatals += 1elif (fields[log_level] == “error”): self.errors += 1elif (fields[log_level] == “warning”): self.warnings += 1...
  • 52. MetricObject("fatals", (self.fatals / self.duration), "per sec")MetricObject("errors", (self.errors / self.duration), "per sec")MetricObject("warning", (self.warnings / self.duration), "per sec")
  • 53. fatals errors warnings
  • 54. Logster Signed-in vs. Signed-out
  • 55. github.com/etsy
  • 56. Log a plethora of data.Don’t be afraid to use one file.
  • 57. Use custom fields to segment data.
  • 58. Correlate errors to specific requests.
  • 59. Make f#@k!ng graphs.
  • 60. Convert rates to trend lines.
  • 61. Take my logs. Please!
  • 62. Thank you. codeascraft.etsy.com github.com/etsyMike BrittainDirector of Engineering, InfrastructureEtsy.commike@etsy.com @mikebrittain