Apache Piped json logs

Our hosting team recently upgraded to Loggly's Gen2 service which required us to revisit our rsyslog configurations and make some adjustments.

First, we wanted to start taking advantage of Loggly's ability to parse JSON logs. We created a new apache logging format called jsonlog according to loggly's setup instructions (along with a couple extra headers we are interested in):

# JSON extended log format including: User-Agent, Referer, and X-Forwarded-For
LogFormat "{ \"time\":\"%t\", \"remoteIP\":\"%a\", \"host\":\"%V\", \"request\":\"%U\", \"query\":\"%q\", \"method\":\"%m\", \"status\":\"%>s\", \"userAgent\":\"%{User-agent}i\", \"referer\":\"%{Referer}i\", \"X-Forwarded-For\":\"%{X-Forwarded-For}i\" }" jsonlog

We can now setup a new CustomLog in our apache conf to use this jsonlog format as follows:

CustomLog /var/log/apache_access.json jsonlog

That was pretty easy but now we're sending even more data to loggly and reaching the limits of our plan's allocated bandwidth. We also aren't interested in capturing anything that is not an error or warning (we are still storing those on the server of course). We need to find a way to reduce any unnecessary data we're shipping and only send 4xx (warn) and 5xx (error) messages. This is where Apache's Piped Logging comes in handy.

In short, this feature allows you to pipe Apache log messages through an external process. This is extremely powerful and will allow us to use a program like grep to pattern match for only errors and warnings, which is what we want. Let's look at an example log entry:

{ "time":"[06/Nov/2013:13:15:14 -0500]", "remoteIP":"10.173.232.21", "host":"www.mywebsite.com", "request":"/some/link/that/doesnot/exist", "query":"", "method":"GET", "status":"404", "userAgent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0", "referer":"-", "X-Forwarded-For":"10.121.43.23, 74.34.63.10" }

If we use grep's extended version and inverse the match to capture everything that is NOT a 2xx (success) or 3xx (redirect) we do something like this:

/bin/grep -E --invert-match 'status.?[[:punct:]].?[23]' 

So now we just need append the output to a file. We simply pipe the output to cat which will append to our log file. Here is what the final piped log looks like in our apache conf:

CustomLog "|/bin/grep -E --invert-match 'status.?[[:punct:]].?[23]' |cat >> /var/log/httpd/errors.json" jsonlog

So if you add this, restart apache, and then visit a webpage on your server that doesn't exist, you should see the resulting 404 error show up as JSON in your new log file right? Wrong. The problem is grep will buffer the output (in our case up to 4096 bytes) before it writes anything. This is not very helpful, but if we add the --line-buffered to our grep command, we will get the unbuffered output. The final configuration looks like this:

As you can see Apache's Piped Logging feature is very powerful. This is just one example of how it can also be incredibly useful.

blog comments powered by Disqus
· By: