TheHive enrichment


An increasing number of SOC’s/IRT-teams, etc. are beginning to use The Hive and ElasticSearch.

While researching these tools I saw a lot of talk about enrichment, and tying various tools together, so I wanted to provide my take on it as well.

I am by no means an expert in any of these tools, or in the IRT process, but I have had the priviledge of getting to know a few people that I would consider experts (even though they might not themselves feel that way), and while watching them work, I started thinking that some of the tasks they routinely perform could be eligible for automation.

Specifically I saw that a lot of the time when they where doing triage or incident response, they would receive an alert (this could be from their EDR tool, tier1 SOC, IDS/IPS, etc), where they would only get provided with an ip-address, and a timestamp.

Because most corporate infrastructures are configured with DHCP they would often have to go look at their ElasticSearch logs, to determine which endpoint (hostname) was assigned with the given IP-Address at the given time.

While this is somewhat trivial to do, it is also a well defined, recurring task, which meant that (if possible) i wanted to see if I could automate it.

Integrating TheHive and ElasticSearch

As you may or may not know The Hive uses an underlying enrichment engine called Cortex.

In short, Cortex works by leveraging analyzers (used for collecting information related to an observable, for instance collecting information from VirusTotal in relation to a checksum) and responders (used to act on information, for instance pushing an ip to a blacklist, or sending an email out).

With this in mind I figured that the way to go, would be to create an analyzer that would be able to query ElasticSearch, and return the hostname that was using the given IP-Address at the specified time.

I figured that the way to do this would be to create the event in TheHive, and attach the given IP-Address as an observable, from which the analyzer could be run.

This however turned out to be somewhat of a dead end for me as analyzers have the caveat of only working on observables, which meant that the only way I was able to provide a timestamp to the analyzer was to manually type it into the messageField of the observable (which I briefly considered but ended up deciding would be way to error-prone in a production environment, as the timestamp would have to adhere to specific formatting rules).

Because of this caveat I started looking at the possibilities if I were to implement this as a responder instead (even though this is not how responders are supposed to be used).

I quickly realized that because responders can be invoked on event, alerts and observables, a responder has acces to a wide range of information related to the event, even if it is implemented to only work with observables.

With this in mind I was able to implement functinal timestamps, using customFields with datatype datetime:

So this meant that I was able to implement A functional responder, which was able to query elasticsearch (through the standard rets-API), and return a report containing all relevant entries, corresponding to the query.

I, however was not entirely satisfied by this, as I felt like this could only be considered as somewhat automation, since I would still have to read through the returned report, and manually input the results as new observables.

Completing the automation

Using cortex, I felt quite limited in what I could do with my results, so I started contemplating how to take my attempted automation a step further, and therefore I started looking into the rest-API for TheHive.

This gave me all the possibilities I wanted, and with this in mind, I was able to leverage another customField called autoEnrichment (with datatype boolean) to be able to define whether I wanted the responder to automatically create new observable(s) from the ElasticSearch results.

The actual code

Analyzers and responders usually consist of the following:

  • A requirements file (which defines which non-standard libraries is needed for the analyzer/responder to work)
  • a json file (defining the prerequisites for the responder/analyzer, such as which datatype it can work with)
  • the analyzer/responder itself (the actual code, that performs the required operations)

I, however choose to split the actual analyzer/responder file into 3 seperate files (,, and

The idea behind this is to seperate the initialization, configurable items, and functionality, in an attempt to make the responder easier to maintain, and easier to build upon, in case a need for a similar responder which can handle other types of logs, should arise.

In keeping with the spirit of maintainability (and best practice) I have also tried to document the code with comments, explaining the functionality, and thoughts behind each code-section, and as such most of the code should be somewhat self-explanatory…

So without further ado, Here is a link to the github repo with the code:

Leave a Reply

Your email address will not be published. Required fields are marked *