When browsing internet, it often happens that a name couldn’t be resolved, and reloading the page works.
The /var/log/resolver file looks like this:
Mar 18 17:26:02 turris kresd[20765]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:29:28 turris kresd[21232]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:31:08 turris kresd[21539]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:40:36 turris kresd[22251]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:40:45 turris kresd[22500]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:46:08 turris kresd[23054]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:46:52 turris kresd[23334]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:47:02 turris kresd[23584]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:47:36 turris kresd[23850]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:47:44 turris kresd[24099]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:49:03 turris kresd[24495]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 18 17:49:12 turris kresd[24745]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
which suggests that the kresd daemon is restarted periodically. I enabled verbose debug ( /etc/resolver/resolver-debug.sh start), and can see that the failed requests don’t make it to the log.
Because of that, I suspect that that the failed DNS requests happen during the kresd restart cycle.
From the other threads, it seems that the file-descriptor limit is unlikely to cause problems, and likely is not related to the frequent service restarts. However, the frequent restarts itself may be a problem.
Could you hint me what next steps I can take to debug the issue?
I tried to run kresd from the command line rather than as a deamon, to see whether it crashes with some SIGSEGV or something else, but it didn’t seem to work:
root@turris:~# /etc/init.d/resolver stop
Called /etc/init.d/kresd stop
remove dhcp script
root@turris:~# /usr/sbin/kresd --noninteractive -c /tmp/kresd.config /tmp/kresd
[system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
↑↑ The kresd did run, but actually none of DNS requests in my local network were answered.
Nothing relevant seems to be in /var/log/messages. Surely nothing that correlates with the kresd restart times.
free and dh look healthy too:
root@turris:~# df -h
Filesystem Size Used Available Use% Mounted on
/dev/mmcblk0p1 7.3G 344.9M 6.8G 5% /
devtmpfs 512.0K 0 512.0K 0% /dev
tmpfs 1008.1M 24.6M 983.5M 2% /tmp
tmpfs 512.0K 0 512.0K 0% /dev
root@turris:~# free
total used free shared buff/cache available
Mem: 2064516 153476 1694320 25180 216720 1771300
Swap: 0 0 0
It’s a freshly unpacked/installed system, I only did basic configuration through the web UI and didn’t even install any additional packages (resolver-debug was already installed, unlike Debugging DNS problems on Turris routers [Turris wiki] says – I wasn’t sure whether the router was new or Amazon return though).
Ok, I didn’t know how I didn’t notice it yesterday, but now I clearly see the reason in /var/log/messages:
Mar 19 11:22:49 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:23:06 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:23:15 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:23:24 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:24:24 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:24:32 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:29:33 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:29:43 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:33:01 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:33:07 turris dhcp_host_domain_ng.py: Refresh kresd leases
Mar 19 11:33:15 turris dhcp_host_domain_ng.py: Refresh kresd leases
I guess it restarts the service on every DHCP refresh, and it’s working as designed.
Probably the easiest I can do then is to disable DNS forwarding completely.
I think forwarding to ISP’s servers can cause trouble in some cases, if they run some legacy resolvers with buggy edge cases that affect DNSSEC validation (which they don’t anticipate). So that is a common thing to try in case of problems, though unrelated to the DHCP stuff (just in case that wasn’t clear).
I switched the forwarding to the use Cloudflare DNS rather than ISP DNS, and it didn’t help.
Now it looks to me that the reason for restarts is the “Enable DHCP clients in DNS” setting, i.e. every time a device in the local network goes online and gets an IP address, the DNS forwarder restarts with updated config.
I’ll disable this option and check whether that helped.
I also have periodic restarts of kresd on my Turris Omnia configuration but is not a crash of kresd.
There is some script witch restart / reload resolver configuration when there is a renew of my WAN IP via DHCP. Even if the renewed IP and DNS information are the same, a script is telling kresd to restart or reload its configuration and the demon restart with a new PID.
I think it is not a problem : I have default keep_cache ‘1’ in /etc/config/resolver
After switching to Cloudflare DNS, and disabling “Enable DHCP clients in DNS”, kresd still restarts for me, and DNS requests still fail occasionally.
However, this time there’s nothing relevant in /var/log/messages
Also, from /var/log/resolver, it looks like restarts usually happen in pairs with 9 seconds between them. Does anyone have more ideas what it can be / how to debug it further?
Mar 21 09:03:24 turris kresd[28468]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:03:34 turris kresd[28746]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:06:41 turris kresd[29222]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:08:21 turris kresd[29585]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:08:39 turris kresd[29868]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:11:09 turris kresd[30315]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:11:18 turris kresd[30592]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:13:15 turris kresd[30991]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:13:25 turris kresd[31269]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:14:26 turris kresd[31581]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:14:34 turris kresd[31858]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:16:34 turris kresd[32325]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:16:44 turris kresd[32602]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:20:05 turris kresd[658]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:20:15 turris kresd[937]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:21:36 turris kresd[1266]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:21:46 turris kresd[1545]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:22:59 turris kresd[1923]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:23:07 turris kresd[2241]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:23:21 turris kresd[2524]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:26:22 turris kresd[3036]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:26:32 turris kresd[3318]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:26:59 turris kresd[3615]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:27:08 turris kresd[3895]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:37:17 turris kresd[4740]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:37:26 turris kresd[5022]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:38:39 turris kresd[5343]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:38:48 turris kresd[5620]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:40:01 turris kresd[6056]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:41:52 turris kresd[6404]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:42:01 turris kresd[6699]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:52:52 turris kresd[7539]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:53:02 turris kresd[7815]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
Mar 21 09:56:41 turris kresd[8332]: [system] warning: hard limit for number of file-descriptors is only 4096 but recommended value is 524288
I also have some strange behavior lately with DNS. The same warning and also resolve failure without any specific error in the logs even in verbose mode. Also the test dns on the configuration page hangs indefinitely…