Our hosting team recently upgraded to Loggly's Gen2 service which required us to revisit our rsyslog configurations and make some adjustments.
First, we wanted to start taking advantage of Loggly's ability to parse JSON logs. We created a new apache logging format called jsonlog
according to loggly's setup instructions (along with a couple extra headers we are interested in):
# JSON extended log format including: User-Agent, Referer, and X-Forwarded-For
LogFormat "{ \"time\":\"%t\", \"remoteIP\":\"%a\", \"host\":\"%V\", \"request\":\"%U\", \"query\":\"%q\", \"method\":\"%m\", \"status\":\"%>s\", \"userAgent\":\"%{User-agent}i\", \"referer\":\"%{Referer}i\", \"X-Forwarded-For\":\"%{X-Forwarded-For}i\" }" jsonlog
We can now setup a new CustomLog in our apache conf to use this jsonlog format as follows:
CustomLog /var/log/apache_access.json jsonlog
That was pretty easy but now we're sending even more data to loggly and reaching the limits of our plan's allocated bandwidth. We also aren't interested in capturing anything that is not an error or warning (we are still storing those on the server of course). We need to find a way to reduce any unnecessary data we're shipping and only send 4xx (warn) and 5xx (error) messages. This is where Apache's Piped Logging comes in handy.
In short, this feature allows you to pipe Apache log messages through an external process. This is extremely powerful and will allow us to use a program like grep
to pattern match for only errors and warnings, which is what we want. Let's look at an example log entry:
{ "time":"[06/Nov/2013:13:15:14 -0500]", "remoteIP":"10.173.232.21", "host":"www.mywebsite.com", "request":"/some/link/that/doesnot/exist", "query":"", "method":"GET", "status":"404", "userAgent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0", "referer":"-", "X-Forwarded-For":"10.121.43.23, 74.34.63.10" }
If we use grep's extended version and inverse the match to capture everything that is NOT a 2xx (success) or 3xx (redirect) we do something like this:
/bin/grep -E --invert-match 'status.?[[:punct:]].?[23]'
So now we just need append the output to a file. We simply pipe the output to cat
which will append to our log file. Here is what the final piped log looks like in our apache conf:
CustomLog "|/bin/grep -E --invert-match 'status.?[[:punct:]].?[23]' |cat >> /var/log/httpd/errors.json" jsonlog
So if you add this, restart apache, and then visit a webpage on your server that doesn't exist, you should see the resulting 404
error show up as JSON in your new log file right? Wrong. The problem is grep
will buffer the output (in our case up to 4096 bytes
) before it writes anything. This is not very helpful, but if we add the --line-buffered
to our grep command, we will get the unbuffered output. The final configuration looks like this:
As you can see Apache's Piped Logging feature is very powerful. This is just one example of how it can also be incredibly useful.