Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Apache logs monitoring
1. Monitoring Apache
• There are many ways to examine apache’s status and
performance
– apachectl –v – tells you the version number
– apachectl –V – gives you complete compiler settings
– apachectl status – gives you the server’s status in the form of a
“scoreboard” where, for each apache child, you see its status as
one of these characters:
• _ waiting for connection
• S starting up
• R reading a request
• W sending a reply
• K keepalive
• D performing DNS lookup
• C closing connection
• L logging information
• G gracefully finishing
• I idle cleanup
• . open slot with no current process
_____CCCCCCC_____RR………………
_CCCCCRR_________CC_CCC__.......
_____CCCCCCCRW______..................
____CCCCLLCCCCCR____..................
2. Extended Status
• You can obtain even more information (including PID)
using ./apachectl fullstatus
– this gives you a snapshot of the current status of each child
– to use fullstatus
• load the mod_info.so module (not needed in apache 2.2, part of the
core)
• add the directive ExtendedStatus On to your httpd.conf file
• add a <Location> container for the address /server-status in your
httpd.conf file that has the directive SetHandler server-status
– Now, when you type ./apachectl fullstatus, the listing gives you
more details:
• Srv – child server number & generation (in the form 5-1), and PID
• Accesses of this connection for this child
• Mode (as per last slide, _, C, R, W, etc)
• CPU usage, number of seconds
• Seconds since beginning of most recent request
• Milliseconds required to process most recent request
• Kilobytes transferred for the connection
• Mbytes transferred for this child
3. server-status and server-info
• You can also obtain view this information via web
browser
• Either server status information (as from the last
slide), or server information
– for either/both of these, add a <Location /server-status> or
<Location /server-info> container
• NOTE: the URL for these is simply http://ipaddress/server-status
or http://ipaddress/server-info
– also to the container the proper handler, SetHandler
server-status or SetHandler server-info
• Information available by server-info includes
– version, compilation date
– modules loaded, directives of each
– hostname, port
– timeout, keep-alive directives
– server root, configuration file location
4. Security
• Making this information available presents a security flaw
– by knowing the version of apache, it is easier to hack into the server
and manipulate/destroy files
– yet this might be useful for a web administrator to check status or
server information at any time either locally or remotely
• In the <Location> container from the previous slide, let’s add
proper allow/deny statements to limit who can access this
information
– deny access to all except for specific IP address/port of the location
where our webadmin will access the server information from
• Order deny,allow
• Deny from all
• Allow from 10.2.3.0/24
– by using 0 as the last octet, we are allowing access to anyone from
this subnetwork (10.2.3)
• the 24 is used to indicate a mask to indicate which octet to examine (8 for
first octet, 16 for first two, 24 for first three)
– do this for both <Location /server-status> and <Location /server-
info> containers (if we use both)
5. Error Pages
• Apache is configured to generate a generic page on
an error based on the status code
– these response pages may lack useful information and so
apache allows you to alter the default configuration on
errors
– you can
• create your own error pages
• create your own error scripts
– for instance, a php script
• generate a short automated message
• use a multi-language error page available in the errors directory
• redirect the attempt to a local URL
– see for instance what happens at www.nku.edu when you specify any
incorrect URL/filename
• redirect the attempt to an external URL
– in your httpd.conf file, you set these up using the
ErrorDocument directive of the form:
• ErrorDocument error-code document-name (or “message”)
6. Examples
• ErrorDocument 401 /subscribe.html
– here, presumably the user was not able to validly log in and
thus generated a 401 error, so we bring up the page
/subscribe.html
• ErrorDocument 404 /cgi-bin/notfound.php
– here, we run a script that we set up to handle any 404 (URL
not found) errors (this is what NKU does)
• ErrorDocument 500 “Server Error!!”
– here, we return a page with the text “Server Error!!”
• ErrorDocument 410
/var/web/errors/HTTP_GONE.html.var
– here, we use one of the error pages made available in apache
– these can respond differently based on several situations
• language of choice based on language negotiation, response includes
environment variable(s) value(s) such as $HTTP_REFERER
• ErrorDocument 505 http://www.errors.org/error505.cgi
– redirect to an external URL because of wrong HTTP version
7. Using the Multi-Language Files
• To use the multi-language error document files available in
your error directory, there are several steps you will have to
make
– create an alias from /error/ to the actual location in your
filespace of your error documents
• Alias /error/ “/usr/local/apache2/error/”
– notice the use of trialing / here!
– create a <Directory> container for that directory containing at a
minimum
• Options IncludesNoExec
• AddOutputFilter Includes html
• AddHandler type-map var
– the files in this directory end with a .var extension
• Order deny,allow
• Allow from all (this is needed since / (root) is denied to all)
– add your ErrorDocument directive
• e.g., ErrorDocument 404 /error/HTTP_NOT_FOUND.html.var
• these already exist the file httpd-multilang-errordoc.conf
8. More on Multi-Language Pages
• The nice thing about the use of the multi-language error
pages that are available in Apache is that, based on
browser information, the actual language returned can be
specialized
– if you look at any of these files, you see entries for Content-
language for a number of different languages
– based on the Content-language sent by the browser, the
matching Body is selected and returned
• further, an if statement allows for a more specialized message as to
whether the page was reached directory or from a referer (a link)
• In order to get the language selected appropriately, you
might want to include two additional directives in your
<Directory> container from the previous slide:
– LanguagePriority list (of languages here, e.g., en cs de es …)
– ForceLanguagePriority Prefer Fallback
9. External Redirects on Errors
• An external redirect is not a matter of simply
“passing the buck”
– recall from chapter 5 a redirect sends a response to the
web browser with a redirect status code (30x) and a new
URL
• the web browser then sends out a new HTTP request of the new
URL
– this can confuse crawlers and other agents who were
expecting content back from their requests or error codes
if the request could not be fulfilled, instead, they are given
a new URL to pursue
• the redirection can also cause problems if it arises during
authentication because the browser is not receiving a 401 code
and so will not prompt the user for a password potentially leaving
the user confused as to why the original request was not fulfilled
yet taken to the wrong location
10. Automatic Logging
• There are two forms of logging that are taken care of
automatically
– access logging – logging every request sent by clients (browsers,
users, software)
– error logging – logging any request that results in an error
• Either type of event will place a new entry into the appropriate
log file
• Each entry will contain at a minimum
– the time/date of the request
– the URL
– the IP address of the requester
• For errors, the status code will be included with the entry
• For accesses, the command serviced (e.g., GET), the status
code, and the browser’s specification (type, OS, HTTP version)
will be included with the entry
• Typically, Apache performs the logging itself
– rather than invoking syslogd or klogd as with other Linux services
11. Error Logging
• Errors can be written to a log file, sent to a pipe (that
is, piped to a Linux command) or written to the linux
syslog service
• There are two apache directives to control logging
– ErrorLog – specify the file or syslog
• if you do not set ErrorLog, it defaults to writing to the file
error_log
• if you specify a filename, it is assumed to be under ServerRoot
unless you specify the full path
• if the filename starts with | then the information is piped to the
command that follows |
– as in | cat which would display the error information to the terminal
window, probably a poor option
• if you specify syslog, the syslogd service is used and follows the
action in the /etc/syslog.conf file for local7 messages
– LogLevel – one of emerg, alert, crit, error, warn, notice,
info, debug (see table 7-2 on page 182 for more detail)
12. IE Browser Error Pages
• IE tends to ignore the error pages sent by Apache
and it displays its own, more generic page
– MS considers their own pages to be more user friendly
– the problem is that the error page sent by Apache might
include some useful content that an IE user will not see
– IE will only display error pages for
• 403, 405, 410 errors if the page’s size > 256 bytes
• 400, 404, 406, 408, 409, 500, 501, 505 errors if the page’s size >
512 bytes
• but these pages, as generated by apache, tend to be smaller than
the byte size listed above
– there is a way to force IE to display the sent error page
using the Windows Registry, but most users will not be
aware of this
– or, you could create your own error pages and make sure
that they are > 512 bytes to force IE to post your pages
• I tried both of these and I could not get IE to post the apache
page so I’m unsure if its even possible!
13. I/O Logging
• Aside from logging requests and errors, you can log
regular apache I/O if desired
– this requires the use of the mod_dumpio module
• this is not part of the base apache, so it must be separately
compiled
– add the LoadModule statement to httpd.conf
– there are three directives
• DumpIOInput on (or off, the default)
• DumpIOOutput on (or off, the default)
• LogLevel=value where value is one of emerg, alert, crit, error,
warn, notice, info, debug – here, you need to use debug
– the I/O logging is sent to your error log file, and because
this generates an enormous amount of messages, you will
probably not want to use this feature at all, or for a very
long time
14. Access Logging
• All http requests to your server are logged in the
access log
– these include requests that result in errors
• Unlike error logging, these can only be logged to a
specified file or written to a pipe
– they cannot be sent to syslogd
• You can specialize the access log using the
mod_log_config module which offers two directives
– CustomLog allows you to specify a new place for the
output (a different file or a pipe)
– LogFormat which allows you to specify how accesses are
logged in terms of what types of information (we will see
details on this in the next slide)
• in addition, the mod_sentenvif module can be used can be used to
set various environment variables based on attributes of a request
15. Log Formatting
• The LogFormat directive allows you to specify how you want
your log entries to appear
– you are able to define different formats and have them sent to
different files although this may not be useful
• LogFormat “format” name
• CustomLog location name
– format is a specification of the type of information to record and in
what order it should be recorded (covered over the next few slides)
– location is the location in your file system where you want the log
file to be written
• if you specify a relative path, it is relative to ServerRoot
– name is the same on both lines used to link a specified format to a
log file
• you can shorten this by just doing CustomLog location “format” and omit
the second directive and the name, but this means that you cannot share a
format between two or more different log files
• You can also specify under what condition(s) a format might
be used (for instance, if the access resulted in a 200 status)
– therefore, you can specify multiple logs, each with its own format
16. More
• The “format” will comprise a series of percent directives
(covered on the next slide) that specify what information
should be logged (recorded)
– these include such pieces of information as requestor’s IP
address, URL requested, time of request, etc
– the entire format is placed inside of “”
• for example, “%a %U” means “IP address of client and URL requested
• Conditional directives allow you to specify what status
code(s) you desire for that piece of information to be
logged
– multiple status codes are separated by commas, and the code(s)
appear between % and the directive
• %200a means to log %a (IP addr of client) if the status code is 200
• %400,401,402,403,404U means to log %U (URL) if the status code is
any of 400-404
– you can also place ! in front of the number as in %!200a
– if the condition is not met, the requested value is replaced by a
hyphen (- ) in the log file
17. Useful Percent Directives
• The full set of percent directives is given in table 7-3 on page
188, here, we look at the most useful
– %a – remote IP address,
– %A – server IP address
– %B – bytes sent excluding header
– %c – connection status when complete
– %D – duration of request
– %f – filename (resource)
– %H – request protocol
– %m – request method
– %P – PID of child servicing request
– %s – status
– %t – time of request
– %u – remote user (only available if user has authenticated)
– %U – requested URL
– %{X}e – output the value of environment variable X
– %{X}i – output X’s header (X might be User-agent or Referer)
– %{format}t – output the time using the provided format
18. Examples
• The Common Log Format is a standard format developed
for NCSA servers
– this format string is “%h %l %u %t “%r” %>s %b” which is
• the host, remote logname (or – if not known/supported), user name (if
known through login), date, request (inside of “” since the is an escape
character), status (3 digit code) with a > prior to the status, and bytes of
the transferred file including the header
• Imagine that your website is linked from other sites and
you want to know how often a visitor has reached your site
through one of those links (referers)
– use “%{Referer}i -> %U”
– this records into your log file the referer and the URL (how
they got here and where they tried to go)
• Or you might want to know the web browser of a visitor
– use “%{User-agent}i”
19. Multiple Logs
• Lets imagine that we want to have one log for all
successful operations, one log for redirections, and one
log for 40x errors
– LogFormat “%200a %200U %200t” success
– CustomLog logs/success_log success
– LogFormat “%301,302,303%a %301,302,303U
%301,302,303t” redirection
– CustomLog logs/redirection_log redirection
– LogFormat “%401,403,404,410a %401,403,404,410U
%401,403,404,410t” error40x
– CustomLog logs/error_log_40x error40x
• We could change the format so that each log file logs
different types of information
– for instance we might want to know the specific error for the
error_log_40x file by adding %s
• note that %s will return the original request’s status in the case of a
redirection (e.g., 30x), if we want the final status, use %>s
– or the size of the file (%B) on a 200 success
20. SetEnvIf Directives
• This directive allows you to set an environment
variable which you can then use for your logging
– the format of the directive is
• SetEnvIf attribute regex env-variable[=value]
– you can set multiple variables if desired
– the attribute is usually a value from the request header
(e.g., Method, Protocol, Host, User-Agent, Referer,
Range) or it can be one of Remote_Addr, Remote_User,
Request_URI or it can be an already defined environment
variable
• example: SetEnvIf Referer www.nku.edu internal
– this sets the variable local (to true)
• example: SetEnvIf Remote_Addr 127.0.0.1 self
• example:
– SetEnvIf Request_URI “.gif$” type=gif
– SetEnvIf Request_URI “.jpg$” type=jpg
– this will set the variable type to be of the type of image requested
21. Log Rotation
• If you are running a web server for even a modest
sized web site, you may receive thousands of hits a
day
– each of these is logged in the access_log file and the
error_log file may become large as well
– log rotation is the process of moving the current log file
into a “retired” log file
• these typically appear with .# after their name as in access_log,
access_log.1, access_log.2, access_log.3 with the previous
access_log.3 being deleted and the new access_log starting
blank
• depending on how quickly a log file fills up, you may want to
rotate the files every day, every week or every month
– while you might write your own script to handle this and
then issue a crontab job, there is a built-in apache
program called rotatelogs that does this for you
• this program is typically in the same directory as apachectl
• you run it as rotatelogs filename rotationtime (in seconds, 86400
22. Favicon.ico
• The favicon is an icon that is displayed in the
browser’s address bar next to the URL of the site’s
“logo” (you can also see these in bookmarks)
– the icon will reside in the web site’s home directory
(DocumentRoot)
• If a site does not have a favicon.ico in that directory,
typically the error and access logs fill up with error
messages
– you have three ways to prevent this
• create an icon and put it in this directory
• create a 0 byte file whose name is favicon.ico in this directory
• suppress the log messages as follows:
– SetEnvIf Request_URI favicon.ico favicon
– CustomLog logs/access_log common env=!favicon
• this says “for any request for favicon.ico, set the variable favicon
to true, and log anything when favicon is false
23. Reporting Programs
• You want to search your log files for useful information
– how many people are visiting? what errors are arising? is the
same IP address sending numerous requests (e.g., denial of
service request)?
• wading through thousands of entries can be time
consuming
– you have many choices such as using awk or writing your own
shell scripts
• with awk, you could count the number of times each unique IP address
is found to see if you are being attacked
• with your own script, you could generate a report that lists all of the
404 errors by URL so that you could see if there are URLs that are
being misinterpreted by the users
– AWStats is a reporting tool that can dig through your file(s)
for useful information like trends, that you might want to
share with your marketing department – this is open source
software