Make essentials survive the OOM-killer

I have an Omnia with 2GB of RAM. Most of the time, it has plenty of memory available. But once in a while, and I haven’t been able to predict it, some cron-triggered job consumes all the memory and triggers the oom-killer. The culprit used to be ucollect, so I disabled that, and the router runs more stably, but there’s still something occasionally killing the router without a clear message in the logs.

Usually the first thing to go is the DNS. kresd, dnsmasq, unbound, they all get killed. Next, Foris dies.

By this time, users are complaining, so I reboot the router. I would rather not have to do that. I would prefer for the router to run continuously with no intervention, rebooting by itself when it updates.

Since the router has more than enough RAM to run essential services, I think they should survive the oom-killer. What’s the best way to add protection? At least kresd, odhcpd, and sshd should be immune. Also udhcp.

kresd will also get killed by SIGBUS in case you fill /tmp (which is in RAM; thanks to kernel over-committing the space for cache)

oom-killer can be disabled for some processes. You can do this on your responsibility because in some situations virtually any of processes can become very “greedy” and therefore a candidate to be killed.

There are multiple ways how to implement disabling oom-killer. One of the “quick&dirty” but reliable ones is to add into /etc/rc.local something like:

echo -17 > /proc/$(ps -C kresd -o pid --no-headers)/oom_adj

This command is run after the entire system is initialized and disables oom-killer for kresd. It should be of course rewritten to a better form (which will test the result of ps for example) for real use. You can also only set oom-killer to lower probability (by setting a negative value between -1 and -16) instead of disabling at all.

Nitpick: the favorite way is pidof kresd

1 Like