Watching for no data

Introduction

So you are sending stuff to your Elasticsearch cluster with some beat, eg. filebeat. But as everyone knows , things go wrong , stuff breaks. But you are trying to be proactive and watch for stuff breaking. So why not let Elasticsearch monitor for missing stuff with a watcher. You go in search for some examples and pretty sure, you will end up at this repo: https://github.com/elastic/examples

The examples repo

This repo is used for providing examples of how to do various stuff with your shining Elasticsearch setup. And if you look in the alerter category , you will find a recipe called system fails to provide data. Oh yeah…

Looks pretty useful. Basically you are setting up a watcher to search an index for hosts seen in the last 24 hour and to search for hosts seen in the last 1 hour. However, there is a catch, the sample doesnt provide any example of how to do the delta. You just end up with 2 lists, that you have little use for 😉

The revised sample

Every change , I get , when I to talk my friends at Elastic, I tell them, the watcher is too hard to use. Make it simpler, please. And they smile and say, “we know” 🙂

So back to the problem.

You have to do some very funky looking painless scripting to find the delta of those lists, we started out with. You do this by the means of a transform.

This is how the transform sections looks in the repo. It is bascially empty, so there will be no transform going on.

  "actions": {
    "log": {
      "transform": {
      "script": {
        "id":"transform"
      }
    },
      "logging": {
        "text": "Systems not responding in the last {{ctx.metadata.last_period}} minutes:{{#ctx.payload._value}}{{.}}:{{/ctx.payload._value}}"
      }
    }
  }

So this is my attempt to fix this problem. Dont get scared, it is not as bad as it looks. Just add it to the watcher.

  "transform": {
    "script": {
      "source": "def last_period = ctx.payload.aggregations.periods.buckets.last_period.hosts.buckets.stream().map(p -> p.key ).collect(Collectors.toList());def history = ctx.payload.aggregations.periods.buckets.history.hosts.buckets.stream().map(e -> e.key ).filter(p -> !last_period.contains(p)).map(p -> [ 'hostname':   p]).collect(Collectors.toList());return  history;",
      "lang": "painless"
    }
}

The source code laid out in a more readable format. Multiline painless scripts in the watcher UI , please , Elastic 😀

def last_period = ctx.payload.aggregations.periods.buckets.last_period.hosts.buckets.
  stream().
    map(p -> p.key ).
      collect(Collectors.toList());

def history = ctx.payload.aggregations.periods.buckets.history.hosts.buckets.
  stream().
    map(e -> e.key ).
      filter(p -> !last_period.contains(p)).
        map(p -> [ 'hostname':   p]).
          collect(Collectors.toList());

return  history;

That code will make a nice list of hosts that hasn’t delivered data in the last period.

To use the list in the action section, you do something like this. Notice the condition in there as well , to prevent the watcher going off and sending emails, when everything is working:

  "actions": {
    "log": {
      "condition": {
        "compare": {
          "ctx.payload._value.0": {
            "not_eq": null
          }
        }
      },
      "email": {
        "profile": "standard",
        "to": [
          "whoever@whatever.com",
        ],
        "subject": "oh no , data missing",
        "body": {
          "html": "<h1>Systems not delivering data in the last {{ctx.metadata.last_period}} perid</h1>  <ul> {{#ctx.payload._value}}<li>{{hostname}}</li>{{/ctx.payload._value}}</ul>"
        }
      }
    }
  },

Conclusion

As usual, there are more ways to achieve the same thing. You could probably do a extremely complex search also. But if you add these 2 sections to your watcher , you are good to go.

Leave a Reply

Your email address will not be published. Required fields are marked *