Application Logging in the 21st century - 2014.key

Tim Bunce
Tim BunceSenior Architect / Entropy Minimizer at TigerLead
Application Logging in 
the 21st Century 
Austrian Perl Workshop – Oct 2014 
1
Logging is Like Lego 
Many 
Interchangeable 
Options 
Not the focus of this talk 
2
Our Journey 
• Almost no logging when I joined in 2008 
• Incremental improvements as a background 
project over years 
• Currently capturing 600-900 logs / minute 
from ~200 machines 
• Not claiming "best practice", just some 
hopefully useful tips from our long journey 
3
Log file per-application 
• Adopted Log::Log4perl 
• Wrote utility function to add a log file 
• Intercept warnings and fatal exceptions 
• Simple layout with timestamp and severity 
4
Log4perl Layout 
Config file 
log4perl.rootLogger = INFO, TLScreen 
! 
log4perl.appender.TLScreen = Log::Log4perl::Appender::Screen 
log4perl.appender.TLScreen.layout = Log::Log4perl::Layout::PatternLayout 
log4perl.appender.TLScreen.layout.ConversionPattern 
= %d{yyMMdd HH:mm:ss} %.1p> %m{chomp} [@%F{1}:%L %M{1}()}]%n 
Example output 
140929 14:06:25 I> some info message [@Broker.pm:221 process()] 
140929 14:06:27 W> a warning [@BlackOakClientRole.pm:296 get_runner_for_class()] 
5
Capture Warnings 
$SIG{__WARN__} = sub { 
! 
# protect against infinite recursion 
return warn @_ ## no critic (RequireCarping) 
if $within_log_sig 
or not defined $Log::Log4perl::Logger::ROOT_LOGGER; 
local $within_log_sig = 1; 
! 
local $Log::Log4perl::caller_depth = $Log::Log4perl::caller_depth + 1; 
! 
chomp(my $msg = shift); 
get_logger()->warn($msg); 
}; 
6
Capture Fatal Exceptions 
$SIG{__DIE__} = sub { 
! 
return if $^S; # We're in an eval, so ignore it 
die @_ if not defined $^S; # Parsing module/eval 
! 
# protect against infinite recursion 
die @_ ## no critic (RequireCarping) 
if $within_log_sig 
or not defined $Log::Log4perl::Logger::ROOT_LOGGER; 
local $within_log_sig=1; 
! 
local $Log::Log4perl::caller_depth = $Log::Log4perl::caller_depth + 1; 
! 
chomp(my $msg = shift); 
get_logger()->fatal($msg); 
die "$msgn"; # may duplicate message but that's better than loosing it 
}; 
! 
7
Were there any errors? 
log4perl.rootLogger = INFO, TLScreen, TLErrorBuffer 
!! 
log4perl.appender.TLErrorBuffer = TigerLead::Log::Appender::RecentSummaryBuffer 
log4perl.appender.TLErrorBuffer.Threshold = ERROR 
log4perl.appender.TLErrorBuffer.max_messages = 10 
log4perl.appender.TLErrorBuffer.layout = Log::Log4perl::Layout::PatternLayout 
log4perl.appender.TLErrorBuffer.layout.ConversionPattern = %m{chomp} 
!! 
Ring buffer for log messages. 
Used at the end of old batch job code to decide if something went wrong. 
8
State of play 
• Timestamped log message with severity etc 
• Per-app log files 
• Can tell if warnings or errors were produced 
But: 
• Not capturing stdout/stderr & non-perl apps 
9
App s 
Flow of log messages 
X 
10
AAppppss 
Floiglses 
Flow of log messages 
X 
11
Capturing stdout/stderr 
setsid $start_daemons_command 2>&1  
| setsid $capture_logs_command & 
! 
setsid puts deamons into a separate process group, isolated from terminal. 
Capture stdout/stderr from all child processes and pipe to logger process. 
Logger process is also in a separate isolated process group 
We use daemontools so for us: 
start_daemons_command="svscan $supervise_dir" 
capture_logs_command="multilog t s1000000 n100 dir $logdir" 
multilog t prepends high-resolution timestamps to log messages 
multilog t accuracy depends on when the log was flushed 
multilog s1000000 n100 dir does log rotation for us 
Logger exits only when all child processes have closed stdout/stderr 
even if they've become daemons, forked more child processes and died. 
12
AAppppss common 
Floiglses 
Flow of log messages 
13
State of play 
• Capturing stdout/stderr & non-perl apps 
But: 
• We had to login to see what was happening 
• No single place to watch errors and 
warnings across the systems 
• Wanted to parse log messages to extract 
more useful info 
14
Log Stream-Store-View 
Stream: 
Logstash – collect, edit, and forward logs 
Store: 
Elasticsearch – real-time distributed search 
and analytics engine. JSON REST over Lucene 
View: 
Kibana – browser based analytics and search 
dashboard for Elasticsearch 
15
Logstash Stream Processing 
Inputs: collectd drupal_dblog elasticsearch eventlog exec file ganglia gelf gemfire generator graphite heroku imap 
invalid_input irc jmx log4j lumberjack pipe puppet_facter rabbitmq rackspace redis relp s3 snmptrap sqlite sqs stdin 
stomp syslog tcp twitter udp unix varnishlog websocket wmi xmpp zenoss zeromq 
Codecs: cloudtrail collectd compress_spooler dots edn edn_lines fluent graphite json json_lines json_spooler 
line msgpack multiline netflow noop oldlogstashjson plain rubydebug spool 
Filters: advisor alter anonymize checksum cidr cipher clone collate csv date dns drop elapsed elasticsearch 
environment extractnumbers fingerprint gelfify geoip grep grok grokdiscovery i18n json json_encode kv metaevent 
metrics multiline mutate noop prune punct railsparallelrequest range ruby sleep split sumnumbers syslog_pri throttle 
translate unique urldecode useragent uuid wms wmts xml zeromq 
Outputs: boundary circonus cloudwatch csv datadog datadog_metrics elasticsearch elasticsearch_http elasticsearch_river 
email exec file ganglia gelf gemfire google_bigquery google_cloud_storage graphite graphtastic hipchat http irc jira juggernaut librato 
loggly lumberjack metriccatcher mongodb nagios nagios_nsca null opentsdb pagerduty pipe rabbitmq rackspace redis redmine riak 
riemann s3 sns solr_http sqs statsd stdout stomp syslog tcp udp websocket xmpp zabbix zeromq 
16
Logstash Configuration 
input { 
stdin { } 
} 
! 
filter { 
grok { 
match => { "message" => "%{COMBINEDAPACHELOG}" } 
} 
date { 
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] 
} 
} 
! 
output { 
elasticsearch { host => localhost } 
stdout { codec => rubydebug } 
} 
17
Elasticsearch Buzzwords 
• Document oriented. Schema free. 
• JSON in and out. RESTful API. 
• Powerful indexing and search via Lucene. 
• Distributed and massively scalable. 
• Big community, rapid growth. 
• Generally awesome. 
18
Kibana 
19
Our ELK setup 
• Started with single machine 
• Now using three machines 
• Logstash, Elasticsearch and Kibana on each 
• Elasticsearch cluster across all three 
• HAProxy load balancer in front of all three 
20
AAppppss common 
logstash 
ES 
Kibana 
Ffilielses 
Flow of log messages 
21
syslog forwarding 
• Forwarding system syslog was easy first step 
• We're using CentOS6 with rsyslog v7.6 
• Started forwarding notice+ severity messages 
but now forward info+ 
22
Rsyslog forwarding 
# buffering config 
$WorkDirectory /var/lib/rsyslog # where to place spool files 
$ActionQueueFileName logstash # unique name prefix for spool files 
$ActionQueueMaxDiskSpace 1g # 1gb space limit 
$ActionQueueSaveOnShutdown on # save messages to disk on shutdown 
$ActionQueueType LinkedList # run asynchronously 
$ActionResumeRetryCount -1 # infinite retries if host is down 
!! 
# forward info+ level logs from all facilities to logstash 
*.info @@logstash-app-stag.tigerlead.local:5544; RSYSLOG_ForwardFormat 
!! 
# RSYSLOG_ForwardFormat gives us high-resolution timestamp and timezone 
# We use TCP (not UDP) for reliability may switch to RELP later 
23
AAppppss common 
logstash 
ES 
Kibana 
System rsyslog 
queue 
Ffilielses 
Flow of log messages 
24
25
Ship our logs to logstash 
• Wanted to parse messages but didn't want 
to do that on the central logstash server 
• Started with a Message::Passing utility to tail 
and parse specific logs files and ship as JSON 
• Turned out we don't need much parsing 
• Now using an extra rsyslogd that follows log 
files and forwards to the local root rsyslogd 
26
AAppppss common 
Shipper logstash 
ES 
Kibana 
System rsyslog 
queue 
Ffilielses 
Flow of log messages 
27
AAppppss common 
rsyslog logstash 
ES 
Kibana 
System rsyslog 
queue 
Ffilielses 
Flow of log messages 
28
Eradicating 'our' log files 
• Still have our 'app log files' separate from the 
'system log files' in /var/log/* 
• Harder to correlate events between them 
• Experiment: use syslog for more/everything? 
• Want: per-app log files, high-res timestamp 
with lexical ordering (sort -m *.log | ...) 
• Let the system look after log rotation etc 
29
Send app logs to syslog 
log4perl.rootLogger = INFO, TLScreen, TLErrorBuffer, TLSyslog 
! 
log4perl.appender.TLSyslog = TigerLead::Log::Appender::Syslog 
log4perl.appender.TLSyslog.layout = Log::Log4perl::Layout::PatternLayout 
log4perl.appender.TLSyslog.layout.ConversionPattern = %m{chomp} [@%F{1}:%L %M{1} 
()}]%n 
! 
The syslog format provides program name, severity and pid. 
30
Eradicating 'our' log files 
template( name="sortable_log_format" type="string" # format for log lines 
# e.g. "2014-06-28 17:47:11.636078 $facility.$severity $program: $message" 
string="%TIMESTAMP:::date-pgsql%.%TIMESTAMP:::date-subseconds% %PRI-TEXT% 
%syslogtag%%msg:::sp-if-no-1st-sp%%msg:::drop-last-lf%n" 
) 
! 
template( name="file_per_programname" type="string" # format for log file names 
# e.g. program="run-parts(/etc/cron.hourly)" 
# becomes "/var/log/tiger/run-parts" using the 'leading safe characters' 
string="/var/log/tiger/%programname:R,ERE,0,ZERO:^[-_a-zA-Z0-9]+--end%.log" 
) 
! 
ruleset(name="write_tiger_progname_log_files") { 
action( Type="omfile" Template="sortable_log_format" 
DynaFile="file_per_programname") 
} 
! 
if ( ($syslogseverity <= 5) or not ($programname == [ ... ]) ) then { 
call write_tiger_progname_log_files 
} 
31
Flow of log messages 
AAppppss common 
rsyslog logstash 
ES 
Kibana 
Ffilielses 
System rsyslog 
Ffilielses queue 
32
Logstash Enrichment #1 
hostgroup - first word of server name 
• handy to focus in on a group of servers 
related to a particular service 
punct - just the punctuation chars 
• handy to focus on, or exclude, a particular 
'shape' of message 
33
Quick Demo 
• Overview 
• Drill-down 
• Time ranges 
• Multiple queries 
• Share URL 
34
State of play 
• No longer had to login to multiple machines 
to see what was happening 
• Can easily drill-down to explore the logs 
from multiple machines and systems 
• Can share a URL to that view - very handy 
But now: 
• Want to be able to live-stream errors 
35
Live-stream to IRC 
• Separate production and staging channels 
• Currently just error severity or higher 
• Messages with 'alert' or 'emergency' severity 
are also sent to main developer channel 
• Proven to be very useful 
36
Live-stream to IRC 
But: 
• occasionally have floods of messages 
• logstash irc rate limiting behaviour is dumb 
• want to rate-limit only 'repeated' messages 
• 'repeated' should allow for minor differences 
• logstash can help... 
37
Enrichment: message_gist 
mutate { 
add_field => [ "message_gist", "%{message}" ] # copy to edit 
} 
mutate { 
# normalize numbers 
gsub =>[ "message_gist", "[-+]?[0-9]*.?[0-9]+([eE][-+]?[0-9]+)?", "N" ] 
# normalize double quoted strings 
gsub =>[ "message_gist", ""[^"]*"", "S" ] 
# normalize single quoted strings, but try to avoid matching apostrophes 
gsub =>[ "message_gist", "(A|W)'[^']*'(?!w)", "1S" ] 
# truncate urls to remove the query/fragment part 
gsub =>[ "message_gist", "(w:/[^?#s]*)S*", "1" ] 
} 
fingerprint { # convert the normalized string into an integer hash 
source => "message_gist" 
target => "message_gist" 
method => "MURMUR3" 
} 
38
Enrichment: repeat tag 
if [severity] and [severity] =~ /0|1|2|3|4/ { 
! 
throttle { 
period => 60 # seconds 
! 
before_count => -1 
after_count => 2 # allow N within period before throttling 
! 
key => "%{hostgroup}%{severity}%{program}%{message_gist}" 
max_counters => 10000 # track this many variants 
! 
add_tag => "repeat" 
} 
! 
# may add a more strict 'duplicate' tag here in future 
# using period=>5, after_count=>1, and %{message} not %{message_gist} 
} 
39
Enrichment: late tag 
# flooding may cause a backlog that delays messages reaching logstash 
# tag messages that arrive 'late' 
ruby { 
code => " 
msg_age = Time.now - event['@timestamp'] 
! 
if msg_age >= +60 then msg_tag = 'late' # delayed 
elsif msg_age <= -60 then msg_tag = 'early' # craziness 
end 
! 
if msg_tag 
then 
event.tag msg_tag 
event['message_delay'] = msg_age.to_i # age 
end 
" 
} 
40
Better IRC live-stream 
if [severity] and [severity] =~ /0|1|2|3|4/ 
and "repeat" not in [tags] 
and (![message_delay] or [message_delay] < 600) # not too 'late' 
{ 
if [severity] =~ /0|1|2|3/ { # 4 (warning) is currently too noisy 
irc { 
channels => [ "#logprod" ] 
messages_per_second => 10 
format => "%{severity_label} %{host} %{program}: %{message}" 
} 
} 
if [severity] =~ /0|1/ { # emergency and alert only 
irc { 
channels => [ "#l2dev" ] 
messages_per_second => 5 
format => "%{severity_label} %{host} %{program}: %{message}" 
} 
} 
} 
41
Flow of log messages 
AAppppss common 
IRC 
rsyslog logstash 
ES 
Kibana 
Ffilielses 
System rsyslog 
Ffilielses queue 
42
State of play 
• Live-stream to IRC, promotes awareness 
• Developers work to reduce spurious noise 
But now we want more context: 
• "what was the app working on when that 
warning or error was triggered?" 
• "what was the web request URL?" 
or "what were the async job parameters?" 
43
How to get context? 
• Add more info into every log message text, 
then parse it out again? Not ideal. 
• Start by capturing all the HTTP access logs 
• Could do log-shipping for each access log file 
• But all traffic passes through HAProxy 
• So HAProxy logging can give us everything 
44
HAProxy logs 
• already had haproxy notice+ messages 
• now added haproxy traffic logs, 
first HTTP then TCP as well 
• can include one request and response cookie 
• plus multiple request and response headers 
45
HAProxy Configuration 
defaults 
mode tcp 
log-format %ci [%t] %ft %b/%s %Tw/%Tc/%Tt %U %B %ts %ac/%fc/ 
%bc/%sc/%rc %sq/%bq 
! 
defaults 
mode http 
log-format %ci [%t] %ft %b/%s %Tw/%Tc/%Tt %U %B %tsc %ac/%fc/ 
%bc/%sc/%rc %sq/%bq %ID %{+Q}r %ST %Tq/%Tr %{+Q}CC %{+Q}hr %{+Q}CS 
%{+Q}hs 
! 
frontend stripes-prod-frontend 108.168.241.12:80 # example service 
capture request header Referer len 200 
capture request header User-agent len 300 
capture response header Location len 300 
capture cookie _session= len 63 
46
HAProxy Logs 
Example TCP log: 
! 
10.60.201.12 [09/Oct/2014:22:29:45.317] carbon-stag-frontend carbon-stag-backend/ 
carbon-app-stag-ddc-01 1/0/2 3040 0 -- 57/45/45/45/0 0/0 
! 
Example HTTP log: 
! 
10.60.199.78 [09/Oct/2014:21:34:04.361] apex-fe-stag-frontend apex-fe-stag-backend/ 
apex-fe-stag-ddc-01 0/0/2594 956 86661 ---- 63/1/0/0/0 0/0 
0A3CC74E:CC62_0A3CC933:0050_5436FF4C_462C7E:696C "GET /a/sa/search? 
rgu=0&domain_id=10366 HTTP/1.1" 200 337/2256 
"_session=4889b2859286db6511f2e9e9b33cdbe37f5b43ab" "{|Mozilla/5.0 (Windows 
NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 
Safari/537.36}" "f_session=4889b2859286db6511f2e9e9b33cdbe37f5b43ab" "{}" 
47
Logstash for HAProxy 
• change the host field (and thus hostgroup) to 
the backend machine name, so the logs from 
haproxy appear to be coming from the 
appropriate machine 
• parse out request URL parameters 
• decode URL parameters 
48
Logstash for HAProxy 
# extract the request url params into a 'params' hash 
mutate { gsub => [ "request", "#.*", "" ] } # remove fragment, if any, first 
kv { source => "request" field_split => "&?" target => "params" } 
! 
# XXX disabled re https://github.com/elasticsearch/logstash/issues/1695 
# urldecode { field => "params" all_fields => true } 
! 
if [response] >= 500 { 
mutate { replace => [ "severity", "4", "severity_label", "warn" ] } 
} 
else if [response] >= 400 { 
mutate { replace => [ "severity", "5", "severity_label", "notice" ] } 
} 
! 
mutate { # replace raw message with a human friendly version to view/search on 
gsub => [ "request", "?.*", "" ] # remove params now we've extracted them 
replace => [ "message", "%{be_host} %{client_ip} %{Tw}/%{Tc}/%{Tt}ms % 
{bytes_in}b %{bytes_out}b %{response} %{verb} %{request}" ] 
} 
(Abridged!) 
49
State of play 
• now have detailed TCP and HTTP traffic logs 
But: 
• still parsing textual messages 
• still hard to handle multi-line messages 
• still don't have contextual data for logs 
• still can't correlate http to application logs 
50
Log as JSON from app 
• Parsing textual log messages to extract data 
that your own code put there is a bit dumb 
• Log as JSON lines instead (jsonlines.org) 
• Opens the door to logging extra information 
• Bonus: solves the multi-line message problem, 
at least for perl apps 
51
Log::Log4perl::Layout::JSON 
log4perl.rootLogger = INFO, TLScreen, TLFile, TLErrorBuffer, TLSyslogJSON 
! 
log4perl.appender.TLSyslogJSON = TigerLead::Log::Appender::Syslog 
log4perl.appender.TLSyslogJSON.Threshold = INFO 
log4perl.appender.TLSyslogJSON.layout = Log::Log4perl::Layout::JSON 
log4perl.appender.TLSyslogJSON.layout.prefix = @cee: # used as tag 
log4perl.appender.TLSyslogJSON.layout.field.message = %m 
log4perl.appender.TLSyslogJSON.layout.field.src_file = %F{1} 
log4perl.appender.TLSyslogJSON.layout.field.src_sub = %M{1} 
log4perl.appender.TLSyslogJSON.layout.field.src_line = %L 
! 
Example output (spaces and line breaks added for clarity): 
! 
2014-10-08 12:56:28.641086 local0.info 70-lead-basic-t[13374]: @cee:{ 
"message":"...n...n...", "src_file":"Foo.pm", "src_sub":"frobnicate", 
"src_line":"18" } 
! 
Note that src_file, src_sub and src_line used to be appended to the message text. 
52
Decoding JSON in logstash 
grok { 
# @cee: is syslog 'CEE Event Flag' per https://cee.mitre.org/ 
match => { message => "^@cee: ?%{GREEDYDATA:cee_data}" } 
add_tag => [ "cee" ] 
tag_on_failure => [] 
} 
! 
if ("cee" in [tags]) { 
json { 
source => "cee_data" 
remove_field => [ "cee_data" ] 
} 
} 
53
State of play 
• now have rich JSON formatted log messages 
• multi-line messages are no longer a problem 
But: 
• still only very basic contextual data for logs 
• still can't correlate http to application logs 
54
"Context Data" 
• Significant items of 'ambient information' 
• The current 'things being worked on' 
• Would like that info added to any log msgs 
• Including warnings and fatal exceptions 
(e.g. if hooked via $SIG{__WARN__}) 
55
Context Data 
for my $foo_id (@list_of_foo_ids) { 
! 
# we want the current $foo_id value to be included 
# in any log messages in this scope 
! 
do_something_useful($foo_id); 
} 
! 
# we DON'T want $foo_id to be included in any future log messages 
56
Context Data 
• Put the 'ambient information' in a hash 
• Add the contents of the hash to the JSON 
• Use local to limit the scope 
57
Context Data 
for my $foo_id (@list_of_foo_ids) { 
local log_context->{foo_id} = $foo_id; # simple! 
do_something_useful($foo_id); 
} 
The imported log_context utility: 
sub log_context { return %Log::Log4perl::MDC::MDC_HASH } 
The Log::Log4perl::Layout::JSON config line: 
log4perl.appender.TLSyslogJSON.layout.include_mdc = 1 
58
Context Data 
Context added to root hash by default: 
2014-10-08 12:56:28.641086 local0.info 70-lead-basic-t[13374]: @cee:{ 
"message":"...n...n...", "src_file":"Foo.pm", "src_sub":"frobnicate", 
"src_line":"18", "foo_id":42 } 
Optionally put context data items into a nested hash: 
log4perl.appender.TLSyslogJSON.layout.name_for_mdc = extra_stuff 
! 
2014-10-08 12:56:28.641086 local0.info 70-lead-basic-t[13374]: @cee:{ 
"message":"...n...n...", "src_file":"Foo.pm", "src_sub":"frobnicate", 
"src_line":"18", "extra_stuff":{ "foo_id":42 } } 
59
State of play 
• now have easy way to add contextual data 
• array and hash refs work (keep it small) 
But: 
• what contextual data should we include? 
• request URL? decoded parameters? 
• expensive to include in every message 
60
HAProxy Correlation 
• We have a stream of haproxy logs 
• We have a stream of application logs 
• Want to be able to correlate them 
"what HTTP request caused this warning?" 
• Add unique-id to HTTP log & HTTP header 
61
HAProxy Configuration 
defaults 
mode http 
unique-id-format %{+X}o %ci:%cp_%fi:%fp_%Ts_%rt:%pid 
unique-id-header X-TLXID 
log-format %ci [%t] %ft %b/%s %Tw/%Tc/%Tt %U %B %tsc %ac/%fc/ 
%bc/%sc/%rc %sq/%bq %ID %{+Q}r %ST %Tq/%Tr %{+Q}CC %{+Q}hr %{+Q}CS 
%{+Q}hs 
! 
• HAProxy now generates a unique-id for each HTTP request 
• Adds it to the HTTP request as a X-TLXID header 
• Includes the unique-id value in the syslog message 
62
Capture X-TLXID 
package TigerLead::Plack::Middleware::SetUpLogContext; 
use strict; 
use warnings; 
use parent qw( Plack::Middleware ); 
! 
use Plack::Request; 
use TigerLead::Log qw(log_context); 
! 
sub call { 
my($self, $env) = @_; 
! 
my $req = Plack::Request->new($env); 
# reset log context at start of a new request 
%{log_context()} = (tlxid => scalar $req->header('X-TLXID')); 
! 
return $self->app->($env); 
} 
63
Correlation 
• Given any log message from a web app we 
can now find the HTTP request that was 
being processed at the time 
• That includes the session cookie, so we can 
view the stream of requests for that session 
• Demo... 
64
Questions? 
65
1 of 65

More Related Content

What's hot(20)

Perl Memory Use - LPW2013Perl Memory Use - LPW2013
Perl Memory Use - LPW2013
Tim Bunce5.4K views
Top 10 Perl Performance TipsTop 10 Perl Performance Tips
Top 10 Perl Performance Tips
Perrin Harkins18.4K views
Perl Dist::Surveyor 2011Perl Dist::Surveyor 2011
Perl Dist::Surveyor 2011
Tim Bunce2.7K views
Profiling with Devel::NYTProfProfiling with Devel::NYTProf
Profiling with Devel::NYTProf
bobcatfish7.9K views
On UnQLiteOn UnQLite
On UnQLite
charsbar6.4K views
RestMQ - HTTP/Redis based Message QueueRestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message Queue
Gleicon Moraes121.3K views
Working with databases in PerlWorking with databases in Perl
Working with databases in Perl
Laurent Dami4.6K views
How to inspect a RUNNING perl processHow to inspect a RUNNING perl process
How to inspect a RUNNING perl process
Masaaki HIROSE4.9K views
Devinsampa nginx-scriptingDevinsampa nginx-scripting
Devinsampa nginx-scripting
Tony Fabeen1.8K views
Tuning Solr for LogsTuning Solr for Logs
Tuning Solr for Logs
Sematext Group, Inc. 9.6K views
Using ngx_lua in UPYUNUsing ngx_lua in UPYUN
Using ngx_lua in UPYUN
Cong Zhang13K views
Lua tech talkLua tech talk
Lua tech talk
Locaweb2.3K views
Redis as a message queueRedis as a message queue
Redis as a message queue
Brandon Lamb8.7K views
Node.js streaming csv downloads proxyNode.js streaming csv downloads proxy
Node.js streaming csv downloads proxy
Ismael Celis18.4K views
Fluentd   unified logging layerFluentd   unified logging layer
Fluentd unified logging layer
Kiyoto Tamura2.9K views
ELK stack at weibo.comELK stack at weibo.com
ELK stack at weibo.com
琛琳 饶2.9K views

Similar to Application Logging in the 21st century - 2014.key(20)

(Fios#02) 2. elk 포렌식 분석(Fios#02) 2. elk 포렌식 분석
(Fios#02) 2. elk 포렌식 분석
INSIGHT FORENSIC474 views
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
Sematext Group, Inc. 27.3K views
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
Renzo Tomà22.5K views
Hotsos Advanced Linux ToolsHotsos Advanced Linux Tools
Hotsos Advanced Linux Tools
Kellyn Pot'Vin-Gorman800 views
LogstashLogstash
Logstash
琛琳 饶34.5K views
Php loggingPhp logging
Php logging
Brent Laminack694 views
LoggingLogging
Logging
Марія Русин553 views
Elk scilifelabElk scilifelab
Elk scilifelab
Guillermo Carrasco Hernández921 views
Logging with Logback in ScalaLogging with Logback in Scala
Logging with Logback in Scala
Knoldus Inc.12.3K views
Serialization in GoSerialization in Go
Serialization in Go
Albert Strasheim21.3K views
Fedora Developer's Conference 2014 TalkFedora Developer's Conference 2014 Talk
Fedora Developer's Conference 2014 Talk
Rainer Gerhards3.5K views

More from Tim Bunce(8)

Recently uploaded(20)

ChatGPT and AI for Web DevelopersChatGPT and AI for Web Developers
ChatGPT and AI for Web Developers
Maximiliano Firtman161 views
ThroughputThroughput
Throughput
Moisés Armani Ramírez31 views
Tunable Laser (1).pptxTunable Laser (1).pptx
Tunable Laser (1).pptx
Hajira Mahmood21 views
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation24 views
CXL at OCPCXL at OCP
CXL at OCP
CXL Forum203 views

Application Logging in the 21st century - 2014.key

  • 1. Application Logging in the 21st Century Austrian Perl Workshop – Oct 2014 1
  • 2. Logging is Like Lego Many Interchangeable Options Not the focus of this talk 2
  • 3. Our Journey • Almost no logging when I joined in 2008 • Incremental improvements as a background project over years • Currently capturing 600-900 logs / minute from ~200 machines • Not claiming "best practice", just some hopefully useful tips from our long journey 3
  • 4. Log file per-application • Adopted Log::Log4perl • Wrote utility function to add a log file • Intercept warnings and fatal exceptions • Simple layout with timestamp and severity 4
  • 5. Log4perl Layout Config file log4perl.rootLogger = INFO, TLScreen ! log4perl.appender.TLScreen = Log::Log4perl::Appender::Screen log4perl.appender.TLScreen.layout = Log::Log4perl::Layout::PatternLayout log4perl.appender.TLScreen.layout.ConversionPattern = %d{yyMMdd HH:mm:ss} %.1p> %m{chomp} [@%F{1}:%L %M{1}()}]%n Example output 140929 14:06:25 I> some info message [@Broker.pm:221 process()] 140929 14:06:27 W> a warning [@BlackOakClientRole.pm:296 get_runner_for_class()] 5
  • 6. Capture Warnings $SIG{__WARN__} = sub { ! # protect against infinite recursion return warn @_ ## no critic (RequireCarping) if $within_log_sig or not defined $Log::Log4perl::Logger::ROOT_LOGGER; local $within_log_sig = 1; ! local $Log::Log4perl::caller_depth = $Log::Log4perl::caller_depth + 1; ! chomp(my $msg = shift); get_logger()->warn($msg); }; 6
  • 7. Capture Fatal Exceptions $SIG{__DIE__} = sub { ! return if $^S; # We're in an eval, so ignore it die @_ if not defined $^S; # Parsing module/eval ! # protect against infinite recursion die @_ ## no critic (RequireCarping) if $within_log_sig or not defined $Log::Log4perl::Logger::ROOT_LOGGER; local $within_log_sig=1; ! local $Log::Log4perl::caller_depth = $Log::Log4perl::caller_depth + 1; ! chomp(my $msg = shift); get_logger()->fatal($msg); die "$msgn"; # may duplicate message but that's better than loosing it }; ! 7
  • 8. Were there any errors? log4perl.rootLogger = INFO, TLScreen, TLErrorBuffer !! log4perl.appender.TLErrorBuffer = TigerLead::Log::Appender::RecentSummaryBuffer log4perl.appender.TLErrorBuffer.Threshold = ERROR log4perl.appender.TLErrorBuffer.max_messages = 10 log4perl.appender.TLErrorBuffer.layout = Log::Log4perl::Layout::PatternLayout log4perl.appender.TLErrorBuffer.layout.ConversionPattern = %m{chomp} !! Ring buffer for log messages. Used at the end of old batch job code to decide if something went wrong. 8
  • 9. State of play • Timestamped log message with severity etc • Per-app log files • Can tell if warnings or errors were produced But: • Not capturing stdout/stderr & non-perl apps 9
  • 10. App s Flow of log messages X 10
  • 11. AAppppss Floiglses Flow of log messages X 11
  • 12. Capturing stdout/stderr setsid $start_daemons_command 2>&1 | setsid $capture_logs_command & ! setsid puts deamons into a separate process group, isolated from terminal. Capture stdout/stderr from all child processes and pipe to logger process. Logger process is also in a separate isolated process group We use daemontools so for us: start_daemons_command="svscan $supervise_dir" capture_logs_command="multilog t s1000000 n100 dir $logdir" multilog t prepends high-resolution timestamps to log messages multilog t accuracy depends on when the log was flushed multilog s1000000 n100 dir does log rotation for us Logger exits only when all child processes have closed stdout/stderr even if they've become daemons, forked more child processes and died. 12
  • 13. AAppppss common Floiglses Flow of log messages 13
  • 14. State of play • Capturing stdout/stderr & non-perl apps But: • We had to login to see what was happening • No single place to watch errors and warnings across the systems • Wanted to parse log messages to extract more useful info 14
  • 15. Log Stream-Store-View Stream: Logstash – collect, edit, and forward logs Store: Elasticsearch – real-time distributed search and analytics engine. JSON REST over Lucene View: Kibana – browser based analytics and search dashboard for Elasticsearch 15
  • 16. Logstash Stream Processing Inputs: collectd drupal_dblog elasticsearch eventlog exec file ganglia gelf gemfire generator graphite heroku imap invalid_input irc jmx log4j lumberjack pipe puppet_facter rabbitmq rackspace redis relp s3 snmptrap sqlite sqs stdin stomp syslog tcp twitter udp unix varnishlog websocket wmi xmpp zenoss zeromq Codecs: cloudtrail collectd compress_spooler dots edn edn_lines fluent graphite json json_lines json_spooler line msgpack multiline netflow noop oldlogstashjson plain rubydebug spool Filters: advisor alter anonymize checksum cidr cipher clone collate csv date dns drop elapsed elasticsearch environment extractnumbers fingerprint gelfify geoip grep grok grokdiscovery i18n json json_encode kv metaevent metrics multiline mutate noop prune punct railsparallelrequest range ruby sleep split sumnumbers syslog_pri throttle translate unique urldecode useragent uuid wms wmts xml zeromq Outputs: boundary circonus cloudwatch csv datadog datadog_metrics elasticsearch elasticsearch_http elasticsearch_river email exec file ganglia gelf gemfire google_bigquery google_cloud_storage graphite graphtastic hipchat http irc jira juggernaut librato loggly lumberjack metriccatcher mongodb nagios nagios_nsca null opentsdb pagerduty pipe rabbitmq rackspace redis redmine riak riemann s3 sns solr_http sqs statsd stdout stomp syslog tcp udp websocket xmpp zabbix zeromq 16
  • 17. Logstash Configuration input { stdin { } } ! filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } date { match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] } } ! output { elasticsearch { host => localhost } stdout { codec => rubydebug } } 17
  • 18. Elasticsearch Buzzwords • Document oriented. Schema free. • JSON in and out. RESTful API. • Powerful indexing and search via Lucene. • Distributed and massively scalable. • Big community, rapid growth. • Generally awesome. 18
  • 20. Our ELK setup • Started with single machine • Now using three machines • Logstash, Elasticsearch and Kibana on each • Elasticsearch cluster across all three • HAProxy load balancer in front of all three 20
  • 21. AAppppss common logstash ES Kibana Ffilielses Flow of log messages 21
  • 22. syslog forwarding • Forwarding system syslog was easy first step • We're using CentOS6 with rsyslog v7.6 • Started forwarding notice+ severity messages but now forward info+ 22
  • 23. Rsyslog forwarding # buffering config $WorkDirectory /var/lib/rsyslog # where to place spool files $ActionQueueFileName logstash # unique name prefix for spool files $ActionQueueMaxDiskSpace 1g # 1gb space limit $ActionQueueSaveOnShutdown on # save messages to disk on shutdown $ActionQueueType LinkedList # run asynchronously $ActionResumeRetryCount -1 # infinite retries if host is down !! # forward info+ level logs from all facilities to logstash *.info @@logstash-app-stag.tigerlead.local:5544; RSYSLOG_ForwardFormat !! # RSYSLOG_ForwardFormat gives us high-resolution timestamp and timezone # We use TCP (not UDP) for reliability may switch to RELP later 23
  • 24. AAppppss common logstash ES Kibana System rsyslog queue Ffilielses Flow of log messages 24
  • 25. 25
  • 26. Ship our logs to logstash • Wanted to parse messages but didn't want to do that on the central logstash server • Started with a Message::Passing utility to tail and parse specific logs files and ship as JSON • Turned out we don't need much parsing • Now using an extra rsyslogd that follows log files and forwards to the local root rsyslogd 26
  • 27. AAppppss common Shipper logstash ES Kibana System rsyslog queue Ffilielses Flow of log messages 27
  • 28. AAppppss common rsyslog logstash ES Kibana System rsyslog queue Ffilielses Flow of log messages 28
  • 29. Eradicating 'our' log files • Still have our 'app log files' separate from the 'system log files' in /var/log/* • Harder to correlate events between them • Experiment: use syslog for more/everything? • Want: per-app log files, high-res timestamp with lexical ordering (sort -m *.log | ...) • Let the system look after log rotation etc 29
  • 30. Send app logs to syslog log4perl.rootLogger = INFO, TLScreen, TLErrorBuffer, TLSyslog ! log4perl.appender.TLSyslog = TigerLead::Log::Appender::Syslog log4perl.appender.TLSyslog.layout = Log::Log4perl::Layout::PatternLayout log4perl.appender.TLSyslog.layout.ConversionPattern = %m{chomp} [@%F{1}:%L %M{1} ()}]%n ! The syslog format provides program name, severity and pid. 30
  • 31. Eradicating 'our' log files template( name="sortable_log_format" type="string" # format for log lines # e.g. "2014-06-28 17:47:11.636078 $facility.$severity $program: $message" string="%TIMESTAMP:::date-pgsql%.%TIMESTAMP:::date-subseconds% %PRI-TEXT% %syslogtag%%msg:::sp-if-no-1st-sp%%msg:::drop-last-lf%n" ) ! template( name="file_per_programname" type="string" # format for log file names # e.g. program="run-parts(/etc/cron.hourly)" # becomes "/var/log/tiger/run-parts" using the 'leading safe characters' string="/var/log/tiger/%programname:R,ERE,0,ZERO:^[-_a-zA-Z0-9]+--end%.log" ) ! ruleset(name="write_tiger_progname_log_files") { action( Type="omfile" Template="sortable_log_format" DynaFile="file_per_programname") } ! if ( ($syslogseverity <= 5) or not ($programname == [ ... ]) ) then { call write_tiger_progname_log_files } 31
  • 32. Flow of log messages AAppppss common rsyslog logstash ES Kibana Ffilielses System rsyslog Ffilielses queue 32
  • 33. Logstash Enrichment #1 hostgroup - first word of server name • handy to focus in on a group of servers related to a particular service punct - just the punctuation chars • handy to focus on, or exclude, a particular 'shape' of message 33
  • 34. Quick Demo • Overview • Drill-down • Time ranges • Multiple queries • Share URL 34
  • 35. State of play • No longer had to login to multiple machines to see what was happening • Can easily drill-down to explore the logs from multiple machines and systems • Can share a URL to that view - very handy But now: • Want to be able to live-stream errors 35
  • 36. Live-stream to IRC • Separate production and staging channels • Currently just error severity or higher • Messages with 'alert' or 'emergency' severity are also sent to main developer channel • Proven to be very useful 36
  • 37. Live-stream to IRC But: • occasionally have floods of messages • logstash irc rate limiting behaviour is dumb • want to rate-limit only 'repeated' messages • 'repeated' should allow for minor differences • logstash can help... 37
  • 38. Enrichment: message_gist mutate { add_field => [ "message_gist", "%{message}" ] # copy to edit } mutate { # normalize numbers gsub =>[ "message_gist", "[-+]?[0-9]*.?[0-9]+([eE][-+]?[0-9]+)?", "N" ] # normalize double quoted strings gsub =>[ "message_gist", ""[^"]*"", "S" ] # normalize single quoted strings, but try to avoid matching apostrophes gsub =>[ "message_gist", "(A|W)'[^']*'(?!w)", "1S" ] # truncate urls to remove the query/fragment part gsub =>[ "message_gist", "(w:/[^?#s]*)S*", "1" ] } fingerprint { # convert the normalized string into an integer hash source => "message_gist" target => "message_gist" method => "MURMUR3" } 38
  • 39. Enrichment: repeat tag if [severity] and [severity] =~ /0|1|2|3|4/ { ! throttle { period => 60 # seconds ! before_count => -1 after_count => 2 # allow N within period before throttling ! key => "%{hostgroup}%{severity}%{program}%{message_gist}" max_counters => 10000 # track this many variants ! add_tag => "repeat" } ! # may add a more strict 'duplicate' tag here in future # using period=>5, after_count=>1, and %{message} not %{message_gist} } 39
  • 40. Enrichment: late tag # flooding may cause a backlog that delays messages reaching logstash # tag messages that arrive 'late' ruby { code => " msg_age = Time.now - event['@timestamp'] ! if msg_age >= +60 then msg_tag = 'late' # delayed elsif msg_age <= -60 then msg_tag = 'early' # craziness end ! if msg_tag then event.tag msg_tag event['message_delay'] = msg_age.to_i # age end " } 40
  • 41. Better IRC live-stream if [severity] and [severity] =~ /0|1|2|3|4/ and "repeat" not in [tags] and (![message_delay] or [message_delay] < 600) # not too 'late' { if [severity] =~ /0|1|2|3/ { # 4 (warning) is currently too noisy irc { channels => [ "#logprod" ] messages_per_second => 10 format => "%{severity_label} %{host} %{program}: %{message}" } } if [severity] =~ /0|1/ { # emergency and alert only irc { channels => [ "#l2dev" ] messages_per_second => 5 format => "%{severity_label} %{host} %{program}: %{message}" } } } 41
  • 42. Flow of log messages AAppppss common IRC rsyslog logstash ES Kibana Ffilielses System rsyslog Ffilielses queue 42
  • 43. State of play • Live-stream to IRC, promotes awareness • Developers work to reduce spurious noise But now we want more context: • "what was the app working on when that warning or error was triggered?" • "what was the web request URL?" or "what were the async job parameters?" 43
  • 44. How to get context? • Add more info into every log message text, then parse it out again? Not ideal. • Start by capturing all the HTTP access logs • Could do log-shipping for each access log file • But all traffic passes through HAProxy • So HAProxy logging can give us everything 44
  • 45. HAProxy logs • already had haproxy notice+ messages • now added haproxy traffic logs, first HTTP then TCP as well • can include one request and response cookie • plus multiple request and response headers 45
  • 46. HAProxy Configuration defaults mode tcp log-format %ci [%t] %ft %b/%s %Tw/%Tc/%Tt %U %B %ts %ac/%fc/ %bc/%sc/%rc %sq/%bq ! defaults mode http log-format %ci [%t] %ft %b/%s %Tw/%Tc/%Tt %U %B %tsc %ac/%fc/ %bc/%sc/%rc %sq/%bq %ID %{+Q}r %ST %Tq/%Tr %{+Q}CC %{+Q}hr %{+Q}CS %{+Q}hs ! frontend stripes-prod-frontend 108.168.241.12:80 # example service capture request header Referer len 200 capture request header User-agent len 300 capture response header Location len 300 capture cookie _session= len 63 46
  • 47. HAProxy Logs Example TCP log: ! 10.60.201.12 [09/Oct/2014:22:29:45.317] carbon-stag-frontend carbon-stag-backend/ carbon-app-stag-ddc-01 1/0/2 3040 0 -- 57/45/45/45/0 0/0 ! Example HTTP log: ! 10.60.199.78 [09/Oct/2014:21:34:04.361] apex-fe-stag-frontend apex-fe-stag-backend/ apex-fe-stag-ddc-01 0/0/2594 956 86661 ---- 63/1/0/0/0 0/0 0A3CC74E:CC62_0A3CC933:0050_5436FF4C_462C7E:696C "GET /a/sa/search? rgu=0&domain_id=10366 HTTP/1.1" 200 337/2256 "_session=4889b2859286db6511f2e9e9b33cdbe37f5b43ab" "{|Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36}" "f_session=4889b2859286db6511f2e9e9b33cdbe37f5b43ab" "{}" 47
  • 48. Logstash for HAProxy • change the host field (and thus hostgroup) to the backend machine name, so the logs from haproxy appear to be coming from the appropriate machine • parse out request URL parameters • decode URL parameters 48
  • 49. Logstash for HAProxy # extract the request url params into a 'params' hash mutate { gsub => [ "request", "#.*", "" ] } # remove fragment, if any, first kv { source => "request" field_split => "&?" target => "params" } ! # XXX disabled re https://github.com/elasticsearch/logstash/issues/1695 # urldecode { field => "params" all_fields => true } ! if [response] >= 500 { mutate { replace => [ "severity", "4", "severity_label", "warn" ] } } else if [response] >= 400 { mutate { replace => [ "severity", "5", "severity_label", "notice" ] } } ! mutate { # replace raw message with a human friendly version to view/search on gsub => [ "request", "?.*", "" ] # remove params now we've extracted them replace => [ "message", "%{be_host} %{client_ip} %{Tw}/%{Tc}/%{Tt}ms % {bytes_in}b %{bytes_out}b %{response} %{verb} %{request}" ] } (Abridged!) 49
  • 50. State of play • now have detailed TCP and HTTP traffic logs But: • still parsing textual messages • still hard to handle multi-line messages • still don't have contextual data for logs • still can't correlate http to application logs 50
  • 51. Log as JSON from app • Parsing textual log messages to extract data that your own code put there is a bit dumb • Log as JSON lines instead (jsonlines.org) • Opens the door to logging extra information • Bonus: solves the multi-line message problem, at least for perl apps 51
  • 52. Log::Log4perl::Layout::JSON log4perl.rootLogger = INFO, TLScreen, TLFile, TLErrorBuffer, TLSyslogJSON ! log4perl.appender.TLSyslogJSON = TigerLead::Log::Appender::Syslog log4perl.appender.TLSyslogJSON.Threshold = INFO log4perl.appender.TLSyslogJSON.layout = Log::Log4perl::Layout::JSON log4perl.appender.TLSyslogJSON.layout.prefix = @cee: # used as tag log4perl.appender.TLSyslogJSON.layout.field.message = %m log4perl.appender.TLSyslogJSON.layout.field.src_file = %F{1} log4perl.appender.TLSyslogJSON.layout.field.src_sub = %M{1} log4perl.appender.TLSyslogJSON.layout.field.src_line = %L ! Example output (spaces and line breaks added for clarity): ! 2014-10-08 12:56:28.641086 local0.info 70-lead-basic-t[13374]: @cee:{ "message":"...n...n...", "src_file":"Foo.pm", "src_sub":"frobnicate", "src_line":"18" } ! Note that src_file, src_sub and src_line used to be appended to the message text. 52
  • 53. Decoding JSON in logstash grok { # @cee: is syslog 'CEE Event Flag' per https://cee.mitre.org/ match => { message => "^@cee: ?%{GREEDYDATA:cee_data}" } add_tag => [ "cee" ] tag_on_failure => [] } ! if ("cee" in [tags]) { json { source => "cee_data" remove_field => [ "cee_data" ] } } 53
  • 54. State of play • now have rich JSON formatted log messages • multi-line messages are no longer a problem But: • still only very basic contextual data for logs • still can't correlate http to application logs 54
  • 55. "Context Data" • Significant items of 'ambient information' • The current 'things being worked on' • Would like that info added to any log msgs • Including warnings and fatal exceptions (e.g. if hooked via $SIG{__WARN__}) 55
  • 56. Context Data for my $foo_id (@list_of_foo_ids) { ! # we want the current $foo_id value to be included # in any log messages in this scope ! do_something_useful($foo_id); } ! # we DON'T want $foo_id to be included in any future log messages 56
  • 57. Context Data • Put the 'ambient information' in a hash • Add the contents of the hash to the JSON • Use local to limit the scope 57
  • 58. Context Data for my $foo_id (@list_of_foo_ids) { local log_context->{foo_id} = $foo_id; # simple! do_something_useful($foo_id); } The imported log_context utility: sub log_context { return %Log::Log4perl::MDC::MDC_HASH } The Log::Log4perl::Layout::JSON config line: log4perl.appender.TLSyslogJSON.layout.include_mdc = 1 58
  • 59. Context Data Context added to root hash by default: 2014-10-08 12:56:28.641086 local0.info 70-lead-basic-t[13374]: @cee:{ "message":"...n...n...", "src_file":"Foo.pm", "src_sub":"frobnicate", "src_line":"18", "foo_id":42 } Optionally put context data items into a nested hash: log4perl.appender.TLSyslogJSON.layout.name_for_mdc = extra_stuff ! 2014-10-08 12:56:28.641086 local0.info 70-lead-basic-t[13374]: @cee:{ "message":"...n...n...", "src_file":"Foo.pm", "src_sub":"frobnicate", "src_line":"18", "extra_stuff":{ "foo_id":42 } } 59
  • 60. State of play • now have easy way to add contextual data • array and hash refs work (keep it small) But: • what contextual data should we include? • request URL? decoded parameters? • expensive to include in every message 60
  • 61. HAProxy Correlation • We have a stream of haproxy logs • We have a stream of application logs • Want to be able to correlate them "what HTTP request caused this warning?" • Add unique-id to HTTP log & HTTP header 61
  • 62. HAProxy Configuration defaults mode http unique-id-format %{+X}o %ci:%cp_%fi:%fp_%Ts_%rt:%pid unique-id-header X-TLXID log-format %ci [%t] %ft %b/%s %Tw/%Tc/%Tt %U %B %tsc %ac/%fc/ %bc/%sc/%rc %sq/%bq %ID %{+Q}r %ST %Tq/%Tr %{+Q}CC %{+Q}hr %{+Q}CS %{+Q}hs ! • HAProxy now generates a unique-id for each HTTP request • Adds it to the HTTP request as a X-TLXID header • Includes the unique-id value in the syslog message 62
  • 63. Capture X-TLXID package TigerLead::Plack::Middleware::SetUpLogContext; use strict; use warnings; use parent qw( Plack::Middleware ); ! use Plack::Request; use TigerLead::Log qw(log_context); ! sub call { my($self, $env) = @_; ! my $req = Plack::Request->new($env); # reset log context at start of a new request %{log_context()} = (tlxid => scalar $req->header('X-TLXID')); ! return $self->app->($env); } 63
  • 64. Correlation • Given any log message from a web app we can now find the HTTP request that was being processed at the time • That includes the session cookie, so we can view the stream of requests for that session • Demo... 64