Squid Proxy Log Format


In my previous post, I mentioned a custom Squid Proxy log format created by my friend and fellow blogger at Security Distractions, David Thejl-Clayton. I know that he spent a lot of time to define this and getting it parsed via Logstash into Elasticsearch to make data searchable. I have been using this very same approach in some of my labs, and therefore I thought I would give him some credit and also show how I have been using this.

Well, If we have configured our clients to use a proxy on the network, it can be an unbelievably valuable data source for security incidents – if we have enabled logging that is. Depending on the size of the network, we will also quickly be generating a vast amount of data, so being able to define exactly what we want written to log, is worth having a look at. This is exactly what the squid log format will let us do.

Squid Proxy is an open source content filter and caching engine, which can be found integrated into more well known proxying solutions on the market today. Squid Proxy comes bundled within pfSense and is something that features heavily in the lab I am building, which I will be writing more and more about over the next few months!

From a security perspective, there are some data that we would really like to include in our proxy logs. Things related to source and destination, but also things like request and responses, and any other details to help us answer if something malicious happened.

So let’s jump right into it, and get our hands a little dirty – Here is a log format that I have been using in some of my labs:

%ts.%03tu %6tr %>a %>p %03>Hs %<st %rm %ru %rp %rv %<a %<p %<A %mt %ssl::>sni "%{User-Agent}>h"

First, a few basics

Every new field begins with a percentage (%) symbol and the order they appear will be how they are written in the log. The numbers represent min/max field width, that can be used to control how much data you want to include e.g. in timestamps or to create space between field entries. I have not been looking too much into this, and in fact just used a few default settings. The bigger and less than (> <) symbols tells us something about the direction of the field value which we will get back to shortly. To get all this right, you will have to look at the documentation which I have provided a link to at the end the post. Lastly, the letters represent the actual field value that we want to log – so let’s start to have a look at them.

The good (and nerdy) stuff..

I divided the format into some smaller chunks and will walk through them step by step.

%ts.%03tu %6tr
1599026873.430     26

The first part here gives us a Unix timestamp of when an event occurred, followed by a dot and a maximum of three milliseconds precision. It is of course crucial to have timestamps in our log, as we need to know when something happened, so we can add things to a timeline and compare different entries over time.

The last field represents the response time in milliseconds. A bit of space is given between the timestamp to ease the view. One could perhaps argue if this information is needed or should be placed elsewhere, but I decided to leave it here for now.  

%>a %>p
192.168.100[.]100 56106

Next up is an IP and port number. Note the bigger than (>) sign here tells us a direction which in this case is the source IP and port. We are naturally interested in where a request came from if we suspect something malicious, so this is definitely two pieces of information that I expect being present.

%03>Hs %<st %rm
200 8860 GET

Here, the first field represents the three digit HTTP status code which was sent to the client/source. This will help our indication if something bad actually happened or not. Was the request successful? Perhaps we believe it was blocked? Any redirection or errors on the client or server side? This is another piece of information that I believe we should definitely include.

The following field is the total size of the reply, to see how much data the client received. At the end we have the request method (GET/POST etc.) to help us see what kind of requests was made, and over time help us learn the nature of the conversation.

%ru %rp %rv
https://www.securitydistractions.com/2020... /2020... 1.1

I also grouped these three fields together as they are somewhat related. The first field gives us the full URL requested by the client/source. Naturally, we are very interested in knowing exactly what was accessed or requested. Perhaps we are pivoting to or from this URL to see if other clients accessed it, which again can help us indicate if something actually happened.

The second field is simply the URL path without the host name. This is merely a convenient way to have this information sorted out separately into a field of its own, and will come in handy later in the process. At the end the protocol version is simply included as well.

%<a %<p
46.30.215[.]134 443

This we have seen this before. Pay attention to the less than (<) symbol which now tells us, that it is the destination IP and port. This gives us the ability to investigate and pivot to other sources of information for this IP. Also, the port information might give us a hint that this is something worth investigating, like traffic to non-standard ports etc.

%<A %mt %ssl::>sni
www.securitydistractions.com text/html www.securitydistractions.com

Almost done. These three covers server FQDN, the MIME content type and SNI from the client. Basicly we would want to know the FQDN to be able to act on this information and also have it sorted away from the URL which we recored earlier. The SNI is nice to know, if the client requested hostname is actually the same as the one accessed . The MIME type is an indication to the type of data that was accessed, and can help us validate our findings.

Mozilla/5.0 (Windows NT 10.0; Win64; x64)....

The last part here, as you might have guessed, is the observed User-Agent. The UA can help indicate what kind of application was used to access a resource. From a security perspective, looking at the UA alone an interesting observation is aggregation of less frequently used. If we know our infrastructure well, we should perhaps be able to rule out some of them, and then investigate the ones we are less certain of. Also, I think it makes sense to log the UA here, because were else would you have it?

The finale

Phew, that was it. Now, to actually use our custom squid format we need to put it into the squid.conf file. If you are using the squid proxy on pfSense, like me, this is very easy and you can do it using the WebUI. Go to ServicesSquid Proxy ServerShow Advanced Settings and then copy paste the format in like shown below. The first line defines the format, and the bottom line calls the new format to be used.

Now our logs will be written using our custom format, and we can start to work with them.

I hope this has been worth a while and gave some ideas on how you are able to customize your squid logs and why this is a great way to help benefit any needs that you might have. This is merely an example, and we could customize this much further and have more or less information included. I highly recommend having a look at the squid log format documentation which you can find here.

If you think this was interesting, stay tuned for a follow up. In one of my next blog posts I will be using the output created here and parse it using Logstash to start using the logs for some good. Until then, read my previous post about running filebeat on pfSense, to help ship logs off the pfSense box and on to Logstash.

Cheers – Michael

Leave a Reply

Your email address will not be published. Required fields are marked *