Kresd progressively failing at some point

Once in a while (seemingly a few days after a software update but without a reboot, and without an email notice of the need for a reboot), the kresd stops responding.

In my messages, I notice pairs of messages like this:

Aug 22 03:07:03 turris /dhcp_host_domain_ng.py: Kresd socket failed:<class 'OSError'>,timed out
Aug 22 03:07:05 turris /dhcp_host_domain_ng.py: Kresd socket failed:<class 'OSError'>,timed out

After a bunch of these, I eventually see a pair like this:
Aug 22 03:08:43 turris /dhcp_host_domain_ng.py: Kresd socket failed:<class 'OSError'>,timed out
Aug 22 03:08:43 turris /dhcp_host_domain_ng.py: Kresd socket failed:<class 'OSError'>,[Errno 11] Resource temporarily unavailable

Thereafter, I get the same error message in pairs over time.
Aug 22 03:09:21 turris /dhcp_host_domain_ng.py: Kresd socket failed:<class 'OSError'>,[Errno 11] Resource temporarily unavailable
Aug 22 03:09:21 turris /dhcp_host_domain_ng.py: Kresd socket failed:<class 'OSError'>,[Errno 11] Resource temporarily unavailable

Maybe kresd should have some sort of heartbeat and reboot protocol? Or better, what is this class ‘OSError’ thing, and how do we avoid it?

1 Like

Does DNS stop working? These lines are just from a glue script that transports changes on DHCP side into local DNS names. Real kresd logs should be in /var/log/resolver.

Oops, now I’ve encountered the problem without a software update intervening in between. So, this time I merely restarted the kresd service instead of rebooting the entire router.

Yes, DNS stops working. DNS not working during prime leisure hours is what inspires me to look into the logs and see these error messages. And it stays not working for hours, as when I was away at a party and my housemates couldn’t use the Internet.

/var/log/resolver is repetitive. During the time span in question, it merely has repeated at irregular intervals:
Aug 29 01:35:22 turris kresd[3091]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288

Sometimes it repeats after 3 hours, and sometimes after 10 minutes.

That line is normal and irrelevant. Or only in the sense that it happens whenever the daemon is (re)started.

Don’t know what you do with your system but seems your file.max value is 4096 which is generally too small and since the complain comes from kresd it is not able open new file so you can increase file-max value.

In my case it is:

root@turris:~# cat /proc/sys/fs/file-max
206656

how to increase:

sysctl -w fs.file-max=262144

To make it persistent:
create file at /etc/sysctl.d eg.:

vi /etc/sysctl.d/11-file-max.conf

add line below and save:
fs.file-max = 262144

finally run:

sysctl -p

I believe the default is

root@turris:~# ulimit -n
1024
root@turris:~# ulimit -Hn
4096

which should be just fine for SOHO DNS, but kresd is trying to get more, as it generally does not know how big a service it’s running.

1 Like