Kresd restarts frequently

When browsing internet, it often happens that a name couldn’t be resolved, and reloading the page works.

The /var/log/resolver file looks like this:

Mar 18 17:26:02 turris kresd[20765]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:29:28 turris kresd[21232]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:31:08 turris kresd[21539]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:40:36 turris kresd[22251]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:40:45 turris kresd[22500]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:46:08 turris kresd[23054]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:46:52 turris kresd[23334]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:47:02 turris kresd[23584]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:47:36 turris kresd[23850]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:47:44 turris kresd[24099]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:49:03 turris kresd[24495]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:49:12 turris kresd[24745]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288

which suggests that the kresd daemon is restarted periodically. I enabled verbose debug ( /etc/resolver/resolver-debug.sh start), and can see that the failed requests don’t make it to the log.
Because of that, I suspect that that the failed DNS requests happen during the kresd restart cycle.

From the other threads, it seems that the file-descriptor limit is unlikely to cause problems, and likely is not related to the frequent service restarts. However, the frequent restarts itself may be a problem.

Could you hint me what next steps I can take to debug the issue?

I tried to run kresd from the command line rather than as a deamon, to see whether it crashes with some SIGSEGV or something else, but it didn’t seem to work:

root@turris:~# /etc/init.d/resolver stop
Called /etc/init.d/kresd stop
remove dhcp script

root@turris:~#  /usr/sbin/kresd --noninteractive -c /tmp/kresd.config /tmp/kresd
[system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288

↑↑ The kresd did run, but actually none of DNS requests in my local network were answered.

The limit is related only in the sense that the warning is printed whenever kresd (re)starts.

Also common logs in /var/log/messages* don’t contain anything related? Especially at the restart moments, luckily indicated by the warning.

One thing to check are commands free and df -h /tmp to see if you might be filling either.

Nothing relevant seems to be in /var/log/messages. Surely nothing that correlates with the kresd restart times.

free and dh look healthy too:

root@turris:~# df -h
Filesystem                Size      Used Available Use% Mounted on
/dev/mmcblk0p1            7.3G    344.9M      6.8G   5% /
devtmpfs                512.0K         0    512.0K   0% /dev
tmpfs                  1008.1M     24.6M    983.5M   2% /tmp
tmpfs                   512.0K         0    512.0K   0% /dev

root@turris:~# free
              total        used        free      shared  buff/cache   available
Mem:        2064516      153476     1694320       25180      216720     1771300
Swap:             0           0           0

It’s a freshly unpacked/installed system, I only did basic configuration through the web UI and didn’t even install any additional packages (resolver-debug was already installed, unlike Debugging DNS problems on Turris routers [Turris wiki] says – I wasn’t sure whether the router was new or Amazon return though).

1 Like

Ok, I didn’t know how I didn’t notice it yesterday, but now I clearly see the reason in /var/log/messages:

Mar 19 11:22:49 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:23:06 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:23:15 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:23:24 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:24:24 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:24:32 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:29:33 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:29:43 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:33:01 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:33:07 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:33:15 turris dhcp_host_domain_ng.py: Refresh kresd leases

I guess it restarts the service on every DHCP refresh, and it’s working as designed.

Probably the easiest I can do then is to disable DNS forwarding completely.

I think forwarding to ISP’s servers can cause trouble in some cases, if they run some legacy resolvers with buggy edge cases that affect DNSSEC validation (which they don’t anticipate). So that is a common thing to try in case of problems, though unrelated to the DHCP stuff (just in case that wasn’t clear).

Either way, collecting verbose logs from some of the failures should get us closer to the real cause.

You should increase file-descriptors size in kernel.

I switched the forwarding to the use Cloudflare DNS rather than ISP DNS, and it didn’t help.

Now it looks to me that the reason for restarts is the “Enable DHCP clients in DNS” setting, i.e. every time a device in the local network goes online and gets an IP address, the DNS forwarder restarts with updated config.
I’ll disable this option and check whether that helped.

I also have periodic restarts of kresd on my Turris Omnia configuration but is not a crash of kresd.

There is some script witch restart / reload resolver configuration when there is a renew of my WAN IP via DHCP. Even if the renewed IP and DNS information are the same, a script is telling kresd to restart or reload its configuration and the demon restart with a new PID.

I think it is not a problem : I have default keep_cache ‘1’ in /etc/config/resolver

After switching to Cloudflare DNS, and disabling “Enable DHCP clients in DNS”, kresd still restarts for me, and DNS requests still fail occasionally.

However, this time there’s nothing relevant in /var/log/messages

Also, from /var/log/resolver, it looks like restarts usually happen in pairs with 9 seconds between them. Does anyone have more ideas what it can be / how to debug it further?

Mar 21 09:03:24 turris kresd[28468]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:03:34 turris kresd[28746]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:06:41 turris kresd[29222]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:08:21 turris kresd[29585]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:08:39 turris kresd[29868]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:11:09 turris kresd[30315]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:11:18 turris kresd[30592]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:13:15 turris kresd[30991]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:13:25 turris kresd[31269]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:14:26 turris kresd[31581]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:14:34 turris kresd[31858]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:16:34 turris kresd[32325]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:16:44 turris kresd[32602]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:20:05 turris kresd[658]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:20:15 turris kresd[937]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:21:36 turris kresd[1266]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:21:46 turris kresd[1545]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:22:59 turris kresd[1923]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:23:07 turris kresd[2241]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:23:21 turris kresd[2524]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:26:22 turris kresd[3036]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:26:32 turris kresd[3318]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:26:59 turris kresd[3615]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:27:08 turris kresd[3895]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:37:17 turris kresd[4740]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:37:26 turris kresd[5022]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:38:39 turris kresd[5343]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:38:48 turris kresd[5620]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:40:01 turris kresd[6056]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:41:52 turris kresd[6404]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:42:01 turris kresd[6699]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:52:52 turris kresd[7539]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:53:02 turris kresd[7815]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:56:41 turris kresd[8332]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288

I also have some strange behavior lately with DNS. The same warning and also resolve failure without any specific error in the logs even in verbose mode. Also the test dns on the configuration page hangs indefinitely…