TF-CSIRT – Whats it all about?

I have recently taken a break from blogging to focus on other things, before jumping back into my Incident Response 101 blog series. I want to write a little bit about TF-CSIRT and the reasons for joining a community like this. It is a process I am slowly becoming familiar with and it definitely deserves more words written about it…

First off…. What is TF-CSIRT?

Task Force Computer Security Incident Response Teams or TF-CSIRT for short, was established by the European CERT community back in the year 2000. The idea was to create a community of incident response groups/teams, which can work together for a common goal. That goal being, spreading information/knowledge sharing, assisting each other in incidents, and any other way they can leverage such a strong community to help in the incident response world.

In order to provide access to the community, a service was created called “The Trusted Introducer Service”. This service is used to provide a directory of incident response teams which are members of TF-CSIRT. The Trusted Introducer Service acts as a clearing house to ensure that members meet the correct requirements when joining. and then offering further processes for becoming accredited or certified TF-CSIRT members.

So what are the benefits?

The main backbone of the TF-CSIRT community is the member database, where emergency contact details for each incident response team are displayed. This information can prove vital in an incident response situation. To maintain this vital community spirit TF-CSIRT hosts regular conferences and meetups for its members, these are great for getting to know other teams and sharing knowledge.

Another huge benefit of TF-CSIRT actually lies within the certification process. This process provides strict requirements based on the SIM3 audit model and essentially means that when you hit the magic certification level, you are one of the best prepared incident response teams in Europe (at least on paper). This is a standard that a lot of teams aspire towards, but unfortunately don’t make it, due to time commitments usually.

The TF-CSIRT community also works very closely with FIRST (Forum of incident response and security teams). This partnership helps deliver a yearly joint conference.

There are many other benefits from becoming a member at TF-CSIRT, and I would highly recommend it!

So how do I join?

Joining TF-CSIRT is broken up into 3 different “memberships” or processes.

Listed Member

The first processs is to become a listed member. This means you become part of the community and you will get your team listed in the TF-CSIRT database. This also means you can begin attending the European conferences and meetups that are offered.

To become a listed member, you need to fulfill some requirements:-

  1. You need to be sponsored by at least 2 other already accredited or certified teams. A good idea here, would be to look at the Trusted Introducer directory and see if you know teams that have already gone through this process. The TF-CSIRT community is becoming larger and larger within Europe, so the chances are you already know the relevant teams to get the process moving.
  2. Get PGP/GPG keys for your team to communicate with TF-CSIRT. This one is a tiny bit of hassle as there is a large debate out there about using PGP, it can be quite difficult to get PGP supported within certain organizations and ad-hoc processes may end up being needed to facilitate this requirement.

Once you have these two main requirements met, you simply fill out a form and email it to the Trusted Introducer email address and VOILA… Well not quite VOILA, there is still an internal process which is undertaken within TF-CSIRT where various members are voting about your membership. But after a period, you will find yourself a listed member!

Accredited Member

A lot of teams who aim for the certification membership, will first need to become accredited members. By becoming accredited you recieve access to the members only part of the Trusted Introducer service where you have access quite a lot of nice information about other teams within the directory which is not publically available. Many teams reach this stage aiming for certification, but for multiple reasons do not progress to that step. You should look at the accreditation step as “we are who we say we are”, an incident response team who wants more than simply being listed, but wants to show the community they mean business.

To become accredited your team must:-

  1. Already be a listed member
  2. Use RFC2350 (I will blog about this soon)
  3. Fill out a large amount of information about your team and their capabilities and service offerings

Once these requirements are met, this information is supplied to the Trusted Introducer team. This time it is not quite VOILA at all. There is a long process where the information you have provided is vetted and assessed. This assessment takes around 3 months to complete and can result in further questions being asked by the Trusted Introducer team. After it is completed and you are accepted, then you gain a shiny new status of “Accredited” within the directory!

Certified Member

Saving the best type of membership for last, a certified member is a team who has met the gold standard for incident response teams. They have adhered to the strict SIM3 model and achieved a maturity rating within this model that is set by the Trusted Introducer team, and essentially means “your team is one of the best in Europe at incident response” (on paper!).

The requirements to become certified:-

  1. Must already be an accredited member
  2. Have a positive SIM3 assessment based on current Trusted Introducer thresholds

The idea with number 2. is that the team will spend time assessing their current maturity within incident response. To do this they use the SIM3 model, something which I will be blogging about very soon! This model is used to ensure that a team has all necessary processes documented and in place, plus that there is a measurable maturity within these processes.

If the team discovers they are not quite ready after completing a SIM3 assessment, they can then spend some time improving processes and documentation to a higher standard. Another low hanging fruit is ensuring that the processes you define are signed off and audited by someone independant from your incident response team. Once you are confident you have met the correct maturity level within your documentation, you can then apply to be certified.

A SIM3 auditor will then be appointed to you, this auditor will perform an onsite workshop at your location and audit all of your documented processes. Performing interviews of certain team members, and really digging deep to ensure that processes are not just something written on paper, but are understood too.

Once this audit is passed, your status will then be changed within the directory to “Certified” and you can then go and show off to your friends! *cough* I mean constituents…

I may make the certification process sound like a long drawn out process, but in fact how else could you achieve such an important gold standard, without being audited externally and being put before a committee who decides if you are mature enough to be certified, any other process like this would also take time. However the benefits that come after being certified are huge, your constituents and management can have safer knowledge that they are being served by a certified team.

Final words…

I hope that you learned something from this blog post, I have become familiar with the whole Trusted Introducer/TF-CSIRT grouping over the last 2 years and I think it is incredibly exciting to be a part of this community. The certification process is also an incredible learning experience and will ensure that you really have everything in order to run your incident response team!

The Trusted Introducer website has far more details and interesting information about the processes, and can be found here:- https://www.trusted-introducer.org/index.html

My next blog post in this area will talk about the SIM3 model and how awesome it is for measuring the maturity of your incident response team…

Incident Response 101 – The Why?

In the previous post we discussed the background for my knowledge within incident response. Now we will jump into the exciting stuff and talk about “The Why?”

I guess a pretty good place to start in defining the incident response process, is understanding why do we need incident response at all?

Incident response wouldn’t exist without something to actually trigger the process. To trigger the process you need an incident, and what will generate that incident?

Threats

Incidents are generated from a threat, whether this threat is a nation state attacker, a script kiddie, a pandemic, or even some sort of natural disaster. So then what is a threat, and how do we define it?

I like to start out this explanation by showing the following diagram:-

Diagram by IncibeCERT

Intent + Capability + Opportunity = Threat

Each one of these conditions needs to be met to fulfill the criteria to create a threat. To make it more understandable I use an example of whether there is a threat at home from my child trying to steal Nutella from the cupboard.

Intent

Intent is pure and simple, does my child want the Nutella? Do they have the desire and drive to get it? Without intent, I could leave Nutella all over the house and not be worried about anything happening to it.

Capability

Does my child have the capability to get the Nutella? I may have left the cupboard door open, and my child may desperately want the Nutella. But they haven’t learned to open a jar yet. So the threat is not there…

Opportunity

Did I leave the Nutella jar open on the kitchen top? So now my child has the perfect chance to get hold of it. The opportunity has been given to them, now they can combine it with their intent and capability to create the threat!

Well what can we do about this?

You may look at these three points, and think there is alot to be done to protect against each part of the “threat process”. But there isn’t… You cannot take actions to reduce the capabilities of your attackers yourself.

You also cannot influence an attackers intent against you. In some niche cases, you could argue that by “doing good things” you might reduce the intent. But this is a relatively difficult issue to measure.

So this leaves only “opportunity” where you can have some sort of impact. I say “some sort of” because an attacker will always get an opportunity. An opportunity can be something as simple as a misconfigured firewall, a vulnerability in a public facing server and many more.

But you can do your best to restrict the number of opportuntities presented to an attacker. A good example of this, is vulnerability management, when an exploit or vulnerability is released and this effects you, taking actions to patch or mitigate it can help reduce the attackers opportunity to become a threat.

But what about incident response?

You may be thinking, well wait a minute, where does incident response fit into this? Incident reponse assumes that the attacker had the opportunity to become a threat and then carried out actions against you which have resulted in an incident needing to be handled. Incident response is purely a reaction process and is driven by threats.

In some cases the lessons learned from the root cause analysis within the incident response process can also assist with reducing the attacker opportunities. An example of this… Imagine having a perimeter firewall hole, which is too wide and allows external access to a number of test servers which are not patched. The subsequent incident from an attacker compromising these servers, can lead to a report which identifies the broad firewall rule and gives advice on how to fix it. Thus reducing the next attackers opportunity to become a threat!

Closing remarks…

In the next post we will look at how we can have an understanding of the threat landscape, and how to figure out which threats might be relevant to us…

Incident Response 101 – The Background

In the previous post, I gave an introduction to my planned set of blog posts around incident response.

But…

The first question is, how have I made it to this stage in my understanding of the incident response process. Which materials, courses, books etc have lead me to develop my current knowledge level in this field. I will try to give a short description of each resource and why it is important…

All authors start with some background about them, so the audience trusts them a little more when they begin reading, “oh this guy has read alot, and is certified in xx and xx, they must know what they are talking about”.

This is a list of resources, that I turn to at least once a week in my work within incident response.

Materials:-

FIRST CSIRT Services Framework 2.0

https://www.first.org/standards/frameworks/csirts/FIRST_CSIRT_Services_Framework_v2.1.0.pdf

It took me quite some time to find this document, and it was quite a way into my journey of discovery within building a Cyber Defence Center before I found it. But once I did, it answered so many of the outstanding questions I had. This document lays it out flat, what you need to do to deliver a large selection of services within the CSIRT world. It opened up a door to a large community for me too, as I found the authors to be very interesting and the FIRST group a very welcome aid in my service architecture. I treat this book like the bible for the services I needed to build.

Just like any religious text, there is always room for intepretation and this resource is very good, but it does not answer every single question. In some areas it raises more questions, which require deeper research and more technically focused answers. But this we will touch in on later in the blog posts on this subject.

SIM3 – Security Incident Management Maturity Model

http://opencsirt.org/wp-content/uploads/2019/12/SIM3-mkXVIIIc.pdf

I started learning about the SIM3 Model whilst beginning research into joining the TF-CSIRT community (something we will look into in later blog posts). This model lays out the perfect foundation for the building blocks you need to assemble an international class incident response team. Attaining a good maturity rating within this model, enables you to join the TF-CSIRT community and know that you have a very well oiled incident response process. The SIM3 Model is written by Don Stikvoort, who has also been highly influential in the FIRST CSIRT Services Framework.

This model is the golden standard for creating an incident response service, and I will reference it alot throughout the blog posts coming up. It gives you some of the backbone structure that you need to then build upon, to create your own service.

Books:-

Intelligence-Driven Incident Response

http://shop.oreilly.com/product/0636920043614.do

I bought this book after attending the SANS FOR578 course that I mention above. I wanted a supplmental resource to aid my studies in Cyber Threat Inteligence, and this book went beyond my expectations. It really breaks down the incident response process in detail and shows where you can begin to look at it as a driver for gaining threat intelligence. This book really helped solved the problem I will later discuss, around “incident recording” language.

I recommend this book to everyone who I meet within the incident response world.

MITRE – Ten Strategies of a World-Class Cybersecurity Operations Center

https://www.mitre.org/sites/default/files/publications/pr-13-1028-mitre-10-strategies-cyber-ops-center.pdf

This book is available for free from the link above. I was lucky enough to recieve a printed copy from someone I met at the FOR578 training course. This book goes into a lot of great details on how to build a SOC and which resources you should look at to do it. Although the book was written back in 2014 and a lot has changed since then, it still holds alot of relevancy today. The section called “Strategy 4” is very useful in determining which functions should an incident response team have, and how can they be developed if needed.

Courses:-

SANS SEC401 – GSEC

https://www.sans.org/course/security-essentials-bootcamp-style

This course was the first “none vendor” focused training course I ever took, before this I was heavily focused on studying Network Security through the CCNA books. This course helped me understand that the security world was bigger than specific vendors offerings and opened up the gates to my eventual drive into cyber security and incident response. For anyone starting out in this field, this course is very useful as it is very broad and tries to get around most of the important topics in cyber security.

SANS FOR578 – Cyber Threat Intelligence

https://www.sans.org/course/cyber-threat-intelligence

If I look back at any course, or anyhing I have ever studied in general. This course holds the top honours for how much I learned. I went into this course with an understanding of how I thought cyber security worked, and then came out the other side with an entirely deeper knowledge and thought process. This course really helped me understand that data can be so powerful when absorbed from the incident response process. Providing that the data is organized into structures and frameworks to present it in a clear way. I also had the added bonus that the course was being taught by Jake Williams (@malwarejake), and his anecdotes helped to further the understanding of the materials. I would say that this course was the straw which broke the camels back and changed me from being a purely technical orientated person to being much more focused on process and structure. I do not have enough great words in my dictionary for this course!

Other resources:-

Don’t ever underestimate the value you can get from just talking to people, whether they are in the incident response field, or in other fields. A great example is the crossover between incident response and incident management in an ITIL sense. Essentially they are the same process and flow, just that incident response has the “cyber” tag.

Closing words…

This is just a list of the resources that I have used, and they are not complete, you need to find the bits you need from each of them and use it to define your own process.

I have also had the massive benefit of learning from some great people and spending time with organizations like CIRCL, Mandiant, Red Canary to name a few… I just try to absorb as much from the experts as possible…

Incident Response 101 – Intro

I have been wanting to write a set of blog posts about this for a while, possibly I will one day turn this into a book! But for now, it can live here.

Over the last year, I have given a few presentations and lectures about incident response, some of which live on our Github in the presentations folder. But they are not tied together and they aren’t “alive” like a series of blog posts could be…

I would like to share alot of the knowledge I gain whilst working within this field, and studying alongside. A lot of the words coming in the next few blog posts, will be coming from experience of delivering exactly what they say.

A problem that I have found whilst trying to understand incident response deeply, is that most incident response books, courses and sales folk seem to really focus on the deep technical parts of incident response… The forensics, the detections, the reverse engineering, the indicators of compromise etc etc. The “sexy” analysis parts, and the easy sell. What I have been missing is a comprehensive guide to the underlying process behind the whole incident response stack.

Then it struck me, most of the people working within incident response are deeply technical and do get down and dirty with the analysis stage. But they aren’t really strong when it comes to the process. A process which is made up of far more stages that just analysis. This ends up creating a vacuum, where incident response seems highly expensive and complex to the outside observer.

So I have decided to write some blog posts to the “2019 me”. So I can help others who are in my shoes, those who need to build something much more than just an analysis team. Those who need to architect the entire process from alert to end report that delivers great actionable results.

Creating detection rules in Elastic SIEM App

It has been quite a long time since I wrote my last blog post, as with everything, life gets in the way! But I have been spending some quiet time rebuilding my lab, and I have upgraded my ELK stack to 7.6, and I am totally blown away by how awesome the Elastic SIEM app is. So I thought I would put together a few blog posts about how to use it!

Prerequisites

  • You must be running 7.6 (duh)…
  • You must be running the basic license.
  • You must be running at a minimum basic authentication within your setup, between Kibana, Elastic, Logstash etc.
  • You must be running TLS on Elastic.


Enabling each one of these prereqs takes time, and if you are using your stack just for testing purposes and haven’t set up TLS or auth before, then good luck! You are in the lucky position I was last week, and welcome to 2 days of work…
However once you are done, you are ready to move on to the real good stuff…

The good stuff

We will use an example to aid in the instructions, this example is based on creating a detection for each time that there is a Windows Defender event ID 1116 – Malware Detected entry in my logs.

First you will need to open the Elastic SIEM app, and then click on “Detections”.

Once you are in the detections window, on the right hand side you will find “Manage signal detection rules”.

In this window “Signal detection rules”, you can see all the rules you currently have created, or imported. You can manage whether they are activated rules, and many other configuration changes can be done here.

To create a new rule click on “Create new rule”

Within the “Create new rule” section, the first thing you will need to do is to define the index you wish the rule to point at, and then the query you want the rule to run. In this example as I am splitting Defender into a separate index, I have chosen my “sd-defender” index, and then my query is written in KQL (Kibana query language). This query is set to use the ECS (elastic commond schema) field of event.code and will match when it finds event.code 1116. Once you have built this first part, click on “Continue”.

The 2nd stage of building a rule, is to add some more description to the rule…

Here you can name the rule, and write a description of what it is/does. You also assign a “Severity” from low to critical, and a “Risk score” from 0-100. In this case I have chosen “Severity” = High and “Risk score” of 75. When you have finished in this section, click on “Continue”.

In this section you can also add some “Advanced settings”… Where you can supply some reference materials to the alert, if you created it from a blog post, or if it came from a Sigma rule, you could supply a URL here. You can also add some examples of false positives, and then also enrich the rule with some MITRE ATT&CK TTPS! In this example, we won’t add them. But I will be blogging again soon about how to do this part using Sigma rules!

The last part of rule creation, is the “Schedule rule” section. Here you can setup how often you would like the rule to run, and when it does run, how far back in time should it run. This is interesting because if you have just created a new rule, and you would like to see how it would have performed over the last days of logs, then you can adjust that setting here. When you are done setting up the schedule, you can then choose to simply “Create rule without activating it” or “Create and activate rule”, both options are pretty self explanatory!

Once the rule is created, we can try to provoke it and see how it turns out… If you head back to the “Detections” page of the SIEM app. In my example, I am lucky because it is my lab and there is nothing else going on…

Now we will trigger a malware detected alarm, by downloading the EICAR test file to one of my lab machines.

BINGO!

And here is the alert landing in the “Signals” pane, from here we can then begin investigation. Right now there is not very much information about how these alerts will then make it to the attention of someone not using the SIEM app directly. But the SIEM app has some incredible offering here, for free! I have also added a bonus item on how to extract the alerts out to case management tools, slack, etc etc.

Bonus bonus bonus

If you want to extract the alerts out of the SIEM app, you can use a tried a tested tool “Elastalert”. The SIEM app uses a system index called “.siem-signals-default-00001”. This index can be read via Elastalert and the alerts can make it out to your SOC team!

We only need to append

Introduction

As Elasticsearch matures over time, they are fixing some of the less obvious stuff. Seemingly little things can be tremendously important though.

One of the new things, that I want to highlight here is the new security privilegie : create_doc. Read about it the Elasticsearch 7.5 release notes.

As Elastic describes it:

With the previous set of index privileges, users that were allowed to index new documents were also allowed to update existing ones.

With the new create_doc, cluster administrators can create a user that is allowed to add new data only. This gives the minimum privileges needed by ingest agents, with no risk that that user can alter and corrupt existing logs. These administrators can (now) rest assured knowing that components that live directly on the machines that they monitor cannot corrupt or hide tracks that are already into the index.

Have a look at the documention as there is one important change, that is needed in the Elasticsearch Logstash output section.

Implementing it

It is very easy to take advantage of this new feature. Create a role called append_writer and assign a user to the new role:

Or if you prefer developer tools

The final to modify is the output section in Logstash. You need to add an action attribute to it:

Of course , the credentials of the append_writer should be kept in secret store of Logstash!

Conclusion

This simple change is trivial to make, but gives great value. You can rest assured, that the user used in Logstash can never be used to change existing documents in your Elastic clusters.

Using Logstash @metadata

Introduction

In a previous post, I showed how to do simple Kafka and Elasticsearch integration. It showed how to use a single Kafka topic to carry many different types of logs into Elasticsearch.

Have a read if you want to catch up or haven’t read it.

This approach had an undesired sideeffect of putting attributes into Elasticsearch, that are not needed and wasting precious diskspace.

Metadata

However there is very simple and elegant way to fix this. Have a read of the description of Logstash metadata fields here

Previous article suggested this approach. This meant storing kafkatopic,myapp and myrotation in every single document, that went through pipeline.

filter {
    mutate {
        copy => { "[@metadata][kafka][topic]" => "kafkatopic" }
    }

   if ![myapp]
   {
     mutate {
       add_field => { "myapp" => "default" }
     }
   }

   if ![myrotation]
   {
     mutate {
       add_field => { "myrotation" => "weekly" }
     }
   }
}

If we convert to using metadata fields, it could look like this instead. No more kafkatopic,myapp or myrotation being stored.

filter {

   if ![myapp]
   {
     mutate {
       add_field => { "myapp" => "default" }
     }
   }

   if ![myrotation]
   {
     mutate {
       add_field => { "myrotation" => "weekly" }
     }
   }
   # take advantage of metadata fields
   if [myapp]
   {
      mutate {
        rename => { "myapp" => "[@metadata][myapp]" }
      }
   }
   if [myrotation]
   {
      mutate {
        rename => { "myrotation" => "[@metadata][myrotation]" }
      }
   }
}

We can then use the new metadata stuff in the output section

output
{
      if [@metadata][myrotation] == "rollover"
      {
                  elasticsearch {
                                   hosts => ["https://elastic01:9200" , "https://elastic02:9200"]
                                   manage_template => false
                                   index => "%{[@metadata][kafka][topic]}-%{[@metadata][myapp]}-active"
                   }
      }

      if [@metadata][myrotation] == "daily"
      {
                   elasticsearch {

                                   hosts => ["https://elastic01:9200" , "https://elastic02:9200"]
                                   manage_template => false
                                   index => "%{[@metadata][kafka][topic]}-%{[@metadata][myapp]}-%{+YYYY.MM.dd}"
                   }
     }

      if [@metadata][myrotation] == "weekly"
      {
                  elasticsearch {
                                   hosts => ["https://elastic91:9200" , "https://elastic02:9200"]
                                   manage_template => false
                                   index => "%{[@metadata][kafka][topic]}-%{[@metadata][customapp]}-%{+xxxx.ww}"
                   }
      }
  }

Debugging

As all outputs automatically remove the @metadata object and you are trying to debug your conf file, you now need to do a simple trick to display the contents of metadata.

output
{
  # also show contents of metadata object
  stdout { codec => rubydebug { metadata => true } }
}

Conclusion

So by using this approach we are no longer storing kafkatopic,myapp and myrotation as attributes in every single document, that is passing through this pipeline.

We save diskspace,processing time and documents are clean.

Simplifying Logstash by adding complexity

Background

A lot of logs that goes into Logstash will be done using the beats protocol. So you will have a pipeline in Logstash listening for beats on port 5044 typically. This could be stuff coming from filebeat,winlogbeat,metricbeat or heartbeat.

In your Logstash filter section, you will over time end up with a huge mess trying to add the relevant parsing of logs inside a bunch of if statements. In the output section, you could see the same mess again,where you output the different types of logs inside another bunch of if statements.

If you have done stuff like this, your code will be increasingly difficult to read and debug. Not to mention the problems, you will face, if multiple persons need to be able to contribute to the configuration of Logstash. Also if you need to move parsing of a specific type to another Logstash node. Then you need to grab the relevant parts by copy/paste, which is errorprone.

input {
  beats {
    port => 5044
  }
}

filter {
  if [type] =="winlogbeat" {
    #enrich winlogbeat
    ....
  }
  if [type] =="heartbeat" {
    #enrich heartbeat
    ....
  }
  if [type] =="mylogfile" {
    #enrich mylogfile
    ....
  }
  if [type] =="dns" {
    #enrich dns
    ....
  }
  if [type] =="dhcp" {
    #enrich ddhcp
    ....
  }
}

output {
  if [type] =="winlogbeat" {
    #output winlogbeat
    ....
  }
  if [type] =="heartbeat" {
    #output heartbeat
    ....
  }
  if [type] =="mylogfile" {
    #output mylogfile
    ....
  }
  if [type] =="dns" {
    #output dns
    ....
  }
  if [type] =="dhcp" {
    #output dhcp
    ....
  }
}

Simplifying

So what to do about this problem you may ask. Earlier people did some stuff by using named conf files that would be picked up by Logstash to form a large configuration. However we want to be modern and use new features made available by Elastic.

Pipeline to pipeline

I read about pipeline-to-pipeline feature in Logstash a long time ago. There is an excellent article about the options here. This feature is now generally available in 7.4.

It’s actually very simple to implement. You create a pipeline file to receive the beats input and then distribute the events to small tailor made pipelines.

input {
  beats {
    port => 5044
  }
}

filter {
}

output {
        if [type] == "dns" {
          pipeline { send_to => dns }
        } else if [type] == "dhcp" {
          pipeline { send_to => dhcp }
        } else if [type] == "mylogfile" {
          pipeline { send_to => mylogfile }
        } else {
          pipeline { send_to => fallback }
        }
}

Then create a new pipeline to handle the specific log type. This code is restricted to parsing DNS logs.

input {
  pipeline { address => dns }
}

filter {
   # do only your parsing of DNS logs
}

output {
  # output dns
}

You must remember to add all your pipelines to your pipelines.yml file. Remember to think about whether you need in-memory queue or persisted queue per pipeline.

- pipeline.id: beats-input
  path.config: "/etc/path/to/beats-input.config"
  pipeline.workers: 3
- pipeline.id: dns
  path.config: "/etc/different/path/dns.cfg"
  queue.type: persisted
  queue.max_bytes: 4gb
- pipeline.id: dhcp
  path.config: "/etc/different/path/dhcp.cfg"
  queue.type: persisted
  queue.max_bytes: 1gb
- pipeline.id: mylogfile
  path.config: "/etc/different/path/mylogfile.cfg"
  queue.type: persisted
  queue.max_bytes: 2gb

Conclusion

We have started using this approach and will be doing this going forward. We get a much simpler way of handling many different types inside Logstash and we are able to distribute the work to more people.

On top of this we are seeing better latency times in logstash. I suggest to read this article while you are at it. You are effectively using parallel pipelines like the article suggests by this approach.

As always, use this approach if you find it applicable to your usecase.

Watching for no data

Introduction

So you are sending stuff to your Elasticsearch cluster with some beat, eg. filebeat. But as everyone knows , things go wrong , stuff breaks. But you are trying to be proactive and watch for stuff breaking. So why not let Elasticsearch monitor for missing stuff with a watcher. You go in search for some examples and pretty sure, you will end up at this repo: https://github.com/elastic/examples

The examples repo

This repo is used for providing examples of how to do various stuff with your shining Elasticsearch setup. And if you look in the alerter category , you will find a recipe called system fails to provide data. Oh yeah…

Looks pretty useful. Basically you are setting up a watcher to search an index for hosts seen in the last 24 hour and to search for hosts seen in the last 1 hour. However, there is a catch, the sample doesnt provide any example of how to do the delta. You just end up with 2 lists, that you have little use for 😉

The revised sample

Every change , I get , when I to talk my friends at Elastic, I tell them, the watcher is too hard to use. Make it simpler, please. And they smile and say, “we know” 🙂

So back to the problem.

You have to do some very funky looking painless scripting to find the delta of those lists, we started out with. You do this by the means of a transform.

This is how the transform sections looks in the repo. It is bascially empty, so there will be no transform going on.

  "actions": {
    "log": {
      "transform": {
      "script": {
        "id":"transform"
      }
    },
      "logging": {
        "text": "Systems not responding in the last {{ctx.metadata.last_period}} minutes:{{#ctx.payload._value}}{{.}}:{{/ctx.payload._value}}"
      }
    }
  }

So this is my attempt to fix this problem. Dont get scared, it is not as bad as it looks. Just add it to the watcher.

  "transform": {
    "script": {
      "source": "def last_period = ctx.payload.aggregations.periods.buckets.last_period.hosts.buckets.stream().map(p -> p.key ).collect(Collectors.toList());def history = ctx.payload.aggregations.periods.buckets.history.hosts.buckets.stream().map(e -> e.key ).filter(p -> !last_period.contains(p)).map(p -> [ 'hostname':   p]).collect(Collectors.toList());return  history;",
      "lang": "painless"
    }
}

The source code laid out in a more readable format. Multiline painless scripts in the watcher UI , please , Elastic 😀

def last_period = ctx.payload.aggregations.periods.buckets.last_period.hosts.buckets.
  stream().
    map(p -> p.key ).
      collect(Collectors.toList());

def history = ctx.payload.aggregations.periods.buckets.history.hosts.buckets.
  stream().
    map(e -> e.key ).
      filter(p -> !last_period.contains(p)).
        map(p -> [ 'hostname':   p]).
          collect(Collectors.toList());

return  history;

That code will make a nice list of hosts that hasn’t delivered data in the last period.

To use the list in the action section, you do something like this. Notice the condition in there as well , to prevent the watcher going off and sending emails, when everything is working:

  "actions": {
    "log": {
      "condition": {
        "compare": {
          "ctx.payload._value.0": {
            "not_eq": null
          }
        }
      },
      "email": {
        "profile": "standard",
        "to": [
          "whoever@whatever.com",
        ],
        "subject": "oh no , data missing",
        "body": {
          "html": "<h1>Systems not delivering data in the last {{ctx.metadata.last_period}} perid</h1>  <ul> {{#ctx.payload._value}}<li>{{hostname}}</li>{{/ctx.payload._value}}</ul>"
        }
      }
    }
  },

Conclusion

As usual, there are more ways to achieve the same thing. You could probably do a extremely complex search also. But if you add these 2 sections to your watcher , you are good to go.

TheHive enrichment

Intro

An increasing number of SOC’s/IRT-teams, etc. are beginning to use The Hive and ElasticSearch.

While researching these tools I saw a lot of talk about enrichment, and tying various tools together, so I wanted to provide my take on it as well.

I am by no means an expert in any of these tools, or in the IRT process, but I have had the priviledge of getting to know a few people that I would consider experts (even though they might not themselves feel that way), and while watching them work, I started thinking that some of the tasks they routinely perform could be eligible for automation.

Specifically I saw that a lot of the time when they where doing triage or incident response, they would receive an alert (this could be from their EDR tool, tier1 SOC, IDS/IPS, etc), where they would only get provided with an ip-address, and a timestamp.

Because most corporate infrastructures are configured with DHCP they would often have to go look at their ElasticSearch logs, to determine which endpoint (hostname) was assigned with the given IP-Address at the given time.

While this is somewhat trivial to do, it is also a well defined, recurring task, which meant that (if possible) i wanted to see if I could automate it.

Integrating TheHive and ElasticSearch

As you may or may not know The Hive uses an underlying enrichment engine called Cortex.

In short, Cortex works by leveraging analyzers (used for collecting information related to an observable, for instance collecting information from VirusTotal in relation to a checksum) and responders (used to act on information, for instance pushing an ip to a blacklist, or sending an email out).

With this in mind I figured that the way to go, would be to create an analyzer that would be able to query ElasticSearch, and return the hostname that was using the given IP-Address at the specified time.

I figured that the way to do this would be to create the event in TheHive, and attach the given IP-Address as an observable, from which the analyzer could be run.

This however turned out to be somewhat of a dead end for me as analyzers have the caveat of only working on observables, which meant that the only way I was able to provide a timestamp to the analyzer was to manually type it into the messageField of the observable (which I briefly considered but ended up deciding would be way to error-prone in a production environment, as the timestamp would have to adhere to specific formatting rules).

Because of this caveat I started looking at the possibilities if I were to implement this as a responder instead (even though this is not how responders are supposed to be used).

I quickly realized that because responders can be invoked on event, alerts and observables, a responder has acces to a wide range of information related to the event, even if it is implemented to only work with observables.

With this in mind I was able to implement functinal timestamps, using customFields with datatype datetime:

So this meant that I was able to implement A functional responder, which was able to query elasticsearch (through the standard rets-API), and return a report containing all relevant entries, corresponding to the query.

I, however was not entirely satisfied by this, as I felt like this could only be considered as somewhat automation, since I would still have to read through the returned report, and manually input the results as new observables.

Completing the automation

Using cortex, I felt quite limited in what I could do with my results, so I started contemplating how to take my attempted automation a step further, and therefore I started looking into the rest-API for TheHive.

This gave me all the possibilities I wanted, and with this in mind, I was able to leverage another customField called autoEnrichment (with datatype boolean) to be able to define whether I wanted the responder to automatically create new observable(s) from the ElasticSearch results.

The actual code

Analyzers and responders usually consist of the following:

  • A requirements file (which defines which non-standard libraries is needed for the analyzer/responder to work)
  • a json file (defining the prerequisites for the responder/analyzer, such as which datatype it can work with)
  • the analyzer/responder itself (the actual code, that performs the required operations)

I, however choose to split the actual analyzer/responder file into 3 seperate files (DHCPResponder.py, DHCPConf.py, and DHCPCallScript.py).

The idea behind this is to seperate the initialization, configurable items, and functionality, in an attempt to make the responder easier to maintain, and easier to build upon, in case a need for a similar responder which can handle other types of logs, should arise.

In keeping with the spirit of maintainability (and best practice) I have also tried to document the code with comments, explaining the functionality, and thoughts behind each code-section, and as such most of the code should be somewhat self-explanatory…

So without further ado, Here is a link to the github repo with the code:

https://github.com/securitydistractions/ElasticSearch-CortexResponder