Simplifying Logstash by adding complexity

Background

A lot of logs that goes into Logstash will be done using the beats protocol. So you will have a pipeline in Logstash listening for beats on port 5044 typically. This could be stuff coming from filebeat,winlogbeat,metricbeat or heartbeat.

In your Logstash filter section, you will over time end up with a huge mess trying to add the relevant parsing of logs inside a bunch of if statements. In the output section, you could see the same mess again,where you output the different types of logs inside another bunch of if statements.

If you have done stuff like this, your code will be increasingly difficult to read and debug. Not to mention the problems, you will face, if multiple persons need to be able to contribute to the configuration of Logstash. Also if you need to move parsing of a specific type to another Logstash node. Then you need to grab the relevant parts by copy/paste, which is errorprone.

input {
  beats {
    port => 5044
  }
}

filter {
  if [type] =="winlogbeat" {
    #enrich winlogbeat
    ....
  }
  if [type] =="heartbeat" {
    #enrich heartbeat
    ....
  }
  if [type] =="mylogfile" {
    #enrich mylogfile
    ....
  }
  if [type] =="dns" {
    #enrich dns
    ....
  }
  if [type] =="dhcp" {
    #enrich ddhcp
    ....
  }
}

output {
  if [type] =="winlogbeat" {
    #output winlogbeat
    ....
  }
  if [type] =="heartbeat" {
    #output heartbeat
    ....
  }
  if [type] =="mylogfile" {
    #output mylogfile
    ....
  }
  if [type] =="dns" {
    #output dns
    ....
  }
  if [type] =="dhcp" {
    #output dhcp
    ....
  }
}

Simplifying

So what to do about this problem you may ask. Earlier people did some stuff by using named conf files that would be picked up by Logstash to form a large configuration. However we want to be modern and use new features made available by Elastic.

Pipeline to pipeline

I read about pipeline-to-pipeline feature in Logstash a long time ago. There is an excellent article about the options here. This feature is now generally available in 7.4.

It’s actually very simple to implement. You create a pipeline file to receive the beats input and then distribute the events to small tailor made pipelines.

input {
  beats {
    port => 5044
  }
}

filter {
}

output {
        if [type] == "dns" {
          pipeline { send_to => dns }
        } else if [type] == "dhcp" {
          pipeline { send_to => dhcp }
        } else if [type] == "mylogfile" {
          pipeline { send_to => mylogfile }
        } else {
          pipeline { send_to => fallback }
        }
}

Then create a new pipeline to handle the specific log type. This code is restricted to parsing DNS logs.

input {
  pipeline { address => dns }
}

filter {
   # do only your parsing of DNS logs
}

output {
  # output dns
}

You must remember to add all your pipelines to your pipelines.yml file. Remember to think about whether you need in-memory queue or persisted queue per pipeline.

- pipeline.id: beats-input
  path.config: "/etc/path/to/beats-input.config"
  pipeline.workers: 3
- pipeline.id: dns
  path.config: "/etc/different/path/dns.cfg"
  queue.type: persisted
  queue.max_bytes: 4gb
- pipeline.id: dhcp
  path.config: "/etc/different/path/dhcp.cfg"
  queue.type: persisted
  queue.max_bytes: 1gb
- pipeline.id: mylogfile
  path.config: "/etc/different/path/mylogfile.cfg"
  queue.type: persisted
  queue.max_bytes: 2gb

Conclusion

We have started using this approach and will be doing this going forward. We get a much simpler way of handling many different types inside Logstash and we are able to distribute the work to more people.

On top of this we are seeing better latency times in logstash. I suggest to read this article while you are at it. You are effectively using parallel pipelines like the article suggests by this approach.

As always, use this approach if you find it applicable to your usecase.

Leave a Reply

Your email address will not be published. Required fields are marked *