As good as your configuration management tool may be, it can only do it’s job if it’s running. Some tips on making sure this is the case, whatever happens.
Why just cf-execd?
With Cfengine, the “heavy lifting” is done by cf-agent, which is normally run on a regular interval by cf-execd (a daemon that runs all the time).
Frequently, servers will also run two other daemons: cf-monitord (keeps statistics on a system) and cf-serverd (allows local file sharing, and remote on-demand execution of cf-agent). Common practice is to include in cf-agent‘s configuration a promise that ensures that the desired daemons are running, and start them if not.
This makes sense, but what happens if cf-execd gets stopped, and then cf-agent is never run again? Well, this should never happen of course. But, out there in the real world, stuff happens:
- maybe you ran out of RAM, and OOM-killer picked cf-execd for some weird reason
- an administrator unwisely killed cf-execd without really knowing what it does
- possibly you messed up your configuration and had it automatically killed (after all, errors are human …)
OK, so how do you avoid that?
Enough rambling, here is what we do:
- Use a promise in the configuration that ensures the daemons we want running are indeed running, or start them if not
- Configure cron to check on cf-execd, and start it if it’s not running
The promise we use is derived from one provided with the Cfengine sources, as follows:
The above Cfengine example uses some interesting concepts:
- A list of the daemon names to check, which is iterated over by each of the three following promises, and reused in their attributes.
- Ordering: the first promise, a processes promise, checks if a daemon is running and defines a class if not; then, the second promise restarts the daemon if that class was set, and finally, a report is printed if the restart went OK.
- Generic class names: whatever the daemons you want to check, these class names will automatically be set and read.
Last, but not least, here is the line we add to /etc/crontab:
Of course, we wouldn’t add that line manually, but instead, use a Cfengine promise to add it if, and only if it’s not already in /etc/crontab… This is a subject for another post, but here’s a sneak preview:
Using the above promises, you will ensure that the Cfengine components you want will always be running (or at least restarted if they stop). Of course, it’s probably a good idea to monitor these promises, so that you don’t end up with a start/stop fight…