Squid Proxy Log Format


In my previous post, I mentioned a custom Squid Proxy log format created by my friend and fellow blogger at Security Distractions, David Thejl-Clayton. I know that he spent a lot of time to define this and getting it parsed via Logstash into Elasticsearch to make data searchable. I have been using this very same approach in some of my labs, and therefore I thought I would give him some credit and also show how I have been using this.

Well, If we have configured our clients to use a proxy on the network, it can be an unbelievably valuable data source for security incidents – if we have enabled logging that is. Depending on the size of the network, we will also quickly be generating a vast amount of data, so being able to define exactly what we want written to log, is worth having a look at. This is exactly what the squid log format will let us do.

Squid Proxy is an open source content filter and caching engine, which can be found integrated into more well known proxying solutions on the market today. Squid Proxy comes bundled within pfSense and is something that features heavily in the lab I am building, which I will be writing more and more about over the next few months!

From a security perspective, there are some data that we would really like to include in our proxy logs. Things related to source and destination, but also things like request and responses, and any other details to help us answer if something malicious happened.

So let’s jump right into it, and get our hands a little dirty – Here is a log format that I have been using in some of my labs:

%ts.%03tu %6tr %>a %>p %03>Hs %<st %rm %ru %rp %rv %<a %<p %<A %mt %ssl::>sni "%{User-Agent}>h"

First, a few basics

Every new field begins with a percentage (%) symbol and the order they appear will be how they are written in the log. The numbers represent min/max field width, that can be used to control how much data you want to include e.g. in timestamps or to create space between field entries. I have not been looking too much into this, and in fact just used a few default settings. The bigger and less than (> <) symbols tells us something about the direction of the field value which we will get back to shortly. To get all this right, you will have to look at the documentation which I have provided a link to at the end the post. Lastly, the letters represent the actual field value that we want to log – so let’s start to have a look at them.

The good (and nerdy) stuff..

I divided the format into some smaller chunks and will walk through them step by step.

%ts.%03tu %6tr
1599026873.430     26

The first part here gives us a Unix timestamp of when an event occurred, followed by a dot and a maximum of three milliseconds precision. It is of course crucial to have timestamps in our log, as we need to know when something happened, so we can add things to a timeline and compare different entries over time.

The last field represents the response time in milliseconds. A bit of space is given between the timestamp to ease the view. One could perhaps argue if this information is needed or should be placed elsewhere, but I decided to leave it here for now.  

%>a %>p
192.168.100[.]100 56106

Next up is an IP and port number. Note the bigger than (>) sign here tells us a direction which in this case is the source IP and port. We are naturally interested in where a request came from if we suspect something malicious, so this is definitely two pieces of information that I expect being present.

%03>Hs %<st %rm
200 8860 GET

Here, the first field represents the three digit HTTP status code which was sent to the client/source. This will help our indication if something bad actually happened or not. Was the request successful? Perhaps we believe it was blocked? Any redirection or errors on the client or server side? This is another piece of information that I believe we should definitely include.

The following field is the total size of the reply, to see how much data the client received. At the end we have the request method (GET/POST etc.) to help us see what kind of requests was made, and over time help us learn the nature of the conversation.

%ru %rp %rv
https://www.securitydistractions.com/2020... /2020... 1.1

I also grouped these three fields together as they are somewhat related. The first field gives us the full URL requested by the client/source. Naturally, we are very interested in knowing exactly what was accessed or requested. Perhaps we are pivoting to or from this URL to see if other clients accessed it, which again can help us indicate if something actually happened.

The second field is simply the URL path without the host name. This is merely a convenient way to have this information sorted out separately into a field of its own, and will come in handy later in the process. At the end the protocol version is simply included as well.

%<a %<p
46.30.215[.]134 443

This we have seen this before. Pay attention to the less than (<) symbol which now tells us, that it is the destination IP and port. This gives us the ability to investigate and pivot to other sources of information for this IP. Also, the port information might give us a hint that this is something worth investigating, like traffic to non-standard ports etc.

%<A %mt %ssl::>sni
www.securitydistractions.com text/html www.securitydistractions.com

Almost done. These three covers server FQDN, the MIME content type and SNI from the client. Basicly we would want to know the FQDN to be able to act on this information and also have it sorted away from the URL which we recored earlier. The SNI is nice to know, if the client requested hostname is actually the same as the one accessed . The MIME type is an indication to the type of data that was accessed, and can help us validate our findings.

Mozilla/5.0 (Windows NT 10.0; Win64; x64)....

The last part here, as you might have guessed, is the observed User-Agent. The UA can help indicate what kind of application was used to access a resource. From a security perspective, looking at the UA alone an interesting observation is aggregation of less frequently used. If we know our infrastructure well, we should perhaps be able to rule out some of them, and then investigate the ones we are less certain of. Also, I think it makes sense to log the UA here, because were else would you have it?

The finale

Phew, that was it. Now, to actually use our custom squid format we need to put it into the squid.conf file. If you are using the squid proxy on pfSense, like me, this is very easy and you can do it using the WebUI. Go to ServicesSquid Proxy ServerShow Advanced Settings and then copy paste the format in like shown below. The first line defines the format, and the bottom line calls the new format to be used.

Now our logs will be written using our custom format, and we can start to work with them.

I hope this has been worth a while and gave some ideas on how you are able to customize your squid logs and why this is a great way to help benefit any needs that you might have. This is merely an example, and we could customize this much further and have more or less information included. I highly recommend having a look at the squid log format documentation which you can find here.

If you think this was interesting, stay tuned for a follow up. In one of my next blog posts I will be using the output created here and parse it using Logstash to start using the logs for some good. Until then, read my previous post about running filebeat on pfSense, to help ship logs off the pfSense box and on to Logstash.

Cheers – Michael

Filebeat 7.8 on pfSense 2.4.5


Hey and welcome to this blog post by me, my name is Michael Pedersen I am 34 years old and I love open source and security stuff. In August of 2019 I started my journey into the danish InfoSec community and along the way I came across Security Distractions because they too love open source and security! Now one year later I have been invited to join the blog and I am very pleased to be able to publish this first post and hopefully have many more to come. So again – Welcome πŸ™‚


On many of my projects I often need a firewall to segment networks, apply different rules, intercept traffic etc. For this I alway use pfSense, because it’s easy to setup, has a ton of functionality, is very well documented and of course free and opensource, which we like.

But unfortunately this is not a blog post about pfSense and how awesome and fun it is to play around with. Actually, this is more of a fusion of different sources that I have put together, to solve a problem I was facing – so let us get started.

For one of my projects I was going to intercept traffic from a client network using pfSense and the squid proxy service and have all traffic written to the squid access log. The second step was then to ship the logs into an Elasticsearch instance and the best way to do this is using Elastic Beats, in this case Filebeat. This is where the challenge awaited.

The challenge…

As you probably know, pfSense is running on FreeBSD and as of the moment, the package manager does not provide any easy install of Elastic components. I searched the internet for a while, not having much luck until I fell over this blog post written by a Swedish pentester called Christoffer Claesson. It seemed he had found an answer to this within an ongoing github issue, but felt that it was to good not to have on its own blog – I totally agree. The original idea was posted by another guy called jakommo which you can read here.

Now, jakommos idea was straight forward. Simply download the github repository and build the beat you wanted yourself. Hell, why did I not think of that! Christoffers and jakommo already has some of the steps covered, but I would like to follow up on them and add a few things.

There is a few prerequisites that you need to get up and running before you start.


  1. Get a fresh FreeBSD machine up and running, preferably the same versions as your pfSense box.
  2. Make sure you can reach your FreeBSD box via SSH. You might need to edit the /etc/ssh/sshd_config and allow root login (not best practice – I know)

Let’s go…

The following steps updates the pkg repository and installs the tools needed to be able to clone the filebeat repository and built it from source. This will furthermore install a bunch of dependencioes, which is why this is done on a separate BSD machine to avoid polluting the pfSense box.

root@freebsd:~ # pkg update -f 
root@freebsd:~ # pkg install git gmake go bash
root@freebsd:~ # git clone https://github.com/elastic/beats.git
root@freebsd:~ # cd beats/filebeat/
root@freebsd:~/beats/filebeat/ # gmake

When the process is done, you should now be able to do this:

Now step back one dir, and create an archive of the filebeat folder.

root@freebsd:~/beats/filebeat/ # cd ..
root@freebsd:~ # tar -czvf filebeat.tar.gz ./filebeat/

This concludes the work with our temporary FreeBSD box, now we have to copy the archive out to our local machine so we can get it to our pfSense box. You can do this any way you like, I prefer to use secure copy from my local machine:

Next up, we need to upload our filebeat archive to our pfSense box. This can be done the same way, or you can use the WebUI like this:

in pfSense WebUI -> Diagnostics -> Command Prompt
browse to the archive on your local machine and upload the archive

All right, we are almost there. Log into your pfSense box using the console or SSH. If you used the WebUI to upload the archive, you will find the file in the /tmp/ folder. Now you can move it to wherever you want on pfSense and extract the archive. For this tutorial I simply stayed inside the /tmp/ dir.

extracting the archive
test it is working

And there you have it. All you have to do now is configure the filebeat to fit your needs and type ./filebeat.

Caveats… Always with the caveats…

  • This has only been tried in a virtual environment but I can’t see why this should not work on hardware as well.
  • When you start filebeat it will run in the foreground of the shell you are using. I will try to do a post on how to get it running as a service instead.
  • I have tried this with Filebeat 7.8 and the master branch (8.0 as of this writing).

Outro… (you still listening?)

Lastly, I want to give a shout out to my friend David Thejl-Clayton and his custom squid log format that you can find in our github repo here. David has spent a great deal of time to define a custom log format optimized with all relevant fields critical for detection and analyzing of proxy alerts.

Cheers – Michael