/etc/resolver/dhcp_host_domain_ng.py causes high load

v.matys · February 15, 2023, 6:46am

Script /etc/resolver/dhcp_host_domain_ng.py is run by DNS resolver. Read dhcp leases and creates some dynamic hosts like file. We have ethers with about 150 records and dynamically we lease about another 150 addresses.

The load is sometimes above 50 when this script is running!!! It runs in many instances/processes simultaneously.

Router becomes unreachable for several minutes. Sometimes several times a day, sometimes it works in this load sufficiently.

The only workaround for me now is disable this script putting sys.exit(0) at the start (if __name…).

I switched from kresd to unbound without any effect.

Hardware: Turris Omnia
OS: /etc/turris-version 6.2.3

Has anybody similar experiences and solution?

Thank you very much.

Viktor

bamf · February 17, 2023, 2:20pm

Hi,

same issue High CPU usage with Unbound on Omnia

Am also desperately looking for a solution. I have a lot of static DHCP leases, so turning off DHCP clients in reForis is not an option.

Can you please show how exactly you put that sys.exit(0) into the script?

v.matys · February 17, 2023, 4:18pm

I did several changes:

1. /etc/init.d/resolver - here I changed variable DHCP_SCRIPT not to be the script run:

DHCP_SCRIPT=/bin/true
#DHCP_SCRIPT=/etc/resolver/dhcp_host_domain_ng.py

2. /etc/hotplug.d/dhcp/40-dynamic-domains
exit 0 # at the beginning of the script under #!/bin/sh line

3. /etc/resolver/dhcp_host_domain_ng.py
sys.exit(0) # at the beginning of the script under #!/usr/bin/python3 line

After these modifications is load under 0.10 almost all time.

I know this solution is to be rewritten after update of packages installing these files. Then these modification I have to do again.

It would be nice to disable this script by resolver using a config parameter

I’m curious if the ouptut provided by dhcp_host_domain_ng.py is actually used. It’s sense is more non-sense. It produces hosts like file from current dhcp leases.

I hope somebody from Turris provides non-workaround solution for us here Thanks in advance.

Viktor

bamf · February 17, 2023, 7:20pm

Thanks a lot, I will apply all these workarounds.

It’s really a devastating issue, I am currently using my mobile connection for audio calls as Omnia cannot provide a stable connection on my home line.

Please Turris Team, find a way to fix this issue.

vcunat · February 17, 2023, 7:52pm

I wonder if it would work to instead uncheck the “DHCP clients in DNS” in reforis/network-settings/dns (as a simpler workaround)

bamf · February 17, 2023, 8:18pm

So what exactly would that do? All my devices (about 100) have a static assignment. I don’t want to risk them not being accessible via hostname. Is it safe to uncheck that option? What does it even do?

iron-maiden · February 17, 2023, 8:43pm

If you dont use Knot-Resolver(kresd) you can do this

Edit /etc/hotplug.d/dhcp/40-dynamic-domains

and add exit 0 as 2nd line as below.

#!/bin/sh
exit 0
export ACTION
export MACADDR
export IPADDR
export HOSTNAME

python3 /etc/resolver/dhcp_host_domain_ng.py "$@"

vcunat · February 17, 2023, 8:55pm

I’m afraid I don’t recall details around the DHCP names.

bamf · February 20, 2023, 11:49am

It seems this was sufficient to fix the issue

Are there any downsides in keeping it like that? Will this get overridden by a TOS update?

vcunat · February 20, 2023, 2:40pm

The whole point of the script is to get dynamic names from DHCP to resolve in DNS (name.lan by default), so you’ll certainly lose that. That’s what the reForis checkbox does as well. It does not turn off DHCP but just transport of DHCP names to DNS.

iron-maiden · February 20, 2023, 7:30pm

I don’t think any downsides. Not sure but might be overwritten by update.

It just populates a file for Knot-resolver

/tmp/dhcp.leases.dynamic

bamf · February 20, 2023, 9:03pm

Hmm shortly after it seemed fixed, it happened again. 100% load on both cores, disrupting the whole network.

I now did all the modifications v.matys mentioned and also did a chmod -x on the scripts.

Any information from turris team on this issue yet?

dhopfm · February 21, 2023, 1:06pm

Did you check what’s causing the high CPU load now that you disabled the call to /etc/resolver/dhcp_host_domain_ng.py? Did you terminate the existing instances of that script after making the changes, e.g., by rebooting your router?

bamf · February 21, 2023, 1:20pm

This only happens occasionally, all these processes spawn, consuming all available CPU cycles. This lasts for about a minute or two, then the processes disappear and network is usable again.

I did not terminate any processes by hand, neither did I reboot the router yet.

bamf · February 21, 2023, 2:19pm

Well… still happening. And I think I can pin that down to when a device joins my network:

When I connect a new WiFi device
When I wake up my PC from sleep
etc.

Then this happens.

dhopfm · February 21, 2023, 8:05pm

Did you terminate these roguie processes in the meantime? If not, please do so by, e.g., restarting the router.

If you then can still reproduce the issue press F5 for a tree view which should show you which parent process it is that calls /etc/resolver/dhcp_host_domain_ng.py.

bamf · February 21, 2023, 8:08pm

Ok, I’m an idiot. I made a backup of the file 40-dynamic-domains in the same folder /etc/hotplug.d/dhcp before changing it. So there was still a file with the original content

Moved it away now. Let’s see if that helps.

bamf · March 2, 2023, 11:19pm

Just updated to 6.2.4 via reForis.

40-dynamic-domains has not been replaced, the exit 0 is still there.

Everything works fine so far, all my local hostnames are resolvable.

Turris Team, please comment on this issue. Is this a bug? Will it be fixed? I would prefer a non-workaround solution which is not subject to potentially break things on updates.

bamf · May 21, 2024, 4:25pm

I have to bring this topic up again.

As it seems impossible to get kresd working again, I am stuck with Unbound.

The issue is still there. It seems that occasionally, but at least every time a device enters the network (gets powered on or awoken from sleep) the issue occurs again.

unbound-control spawns zombie processes when /etc/resolver/dhcp_host_domain_ng.py is called which brings my network to a halt for about a minute or two.

Any more ideas on how to fix this? This time unbound-control itself is causing the issue.

vcunat · May 21, 2024, 5:06pm

It’s not impossible. Noone just offered help to debug your non-standard config modifications. Resetting to factory settings would surely solve this. Or you could diff the /etc dirs between your current state and the factory snapshot, or something…