/etc/resolver/dhcp_host_domain_ng.py causes high load

martin131 · June 11, 2024, 7:32pm

Same problem on my Turris 1.0 with Turris OS version 6.5.2. After connecting a new device to wi-fi, Turris’s load is stuck on 50.

I just turned off the “Enable DHCP clients in DNS” option. Maybe it will be good now.

Is there any way I can help you with debugging? @vcunat

bamf · July 8, 2024, 10:41am

I get this when switching to kresd and trying to resolve a local hostname:

root@omnia:~# dig wecker.home.arpa

; <<>> DiG 9.18.24 <<>> wecker.home.arpa
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 33493
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 2

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; EDE: 21 (Not Supported): (CR36)
;; QUESTION SECTION:
;wecker.home.arpa.              IN      A

;; AUTHORITY SECTION:
wecker.home.arpa.       10800   IN      SOA     wecker.home.arpa. nobody.invalid. 1 3600 1200 604800 10800

;; ADDITIONAL SECTION:
explanation.invalid.    10800   IN      TXT     "Blocking is mandated by standards, see references on https://www.iana.org/assignments/locally-served-dns-zones/locally-served-dns-zones.xhtml"

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Mon Jul 08 12:38:28 CEST 2024
;; MSG SIZE  rcvd: 278

What is happening here? Why is it blocked?

vcunat · July 8, 2024, 11:13am

It is blocked because the standards say that it should be blocked by default. And in Turris there’s probably not a good way of inserting those names at this point.

bamf · July 8, 2024, 11:25am

So what can I do? It works fine with Unbound, but is blocked when using kresd.

home.arpa is my local domain.

vcunat · July 8, 2024, 12:01pm

you could

use a different prefix (for now)
use a 5.7.x workaround: Local name resolving not working if using 'home.arpa' DNS suffix for LAN - #2 by vcunat
with knot-resolver6 package it will probably just work, but it’s not really a finished thing yet (especially integration into Turris)

bamf · July 8, 2024, 1:13pm

Thanks, seems to work!

bamf · July 8, 2024, 9:24pm

So, one more thing.

As I am now able to use kresd, I set up my local Unbound as upstream resolver for kresd. So I do not need kresd to prefetch records. How to properly disable prefetching? Is cache.prefetch = false correct?

This is how my /tmp/kresd.config looks:

--Automatically generated file; DO NOT EDIT
modules = {
    'hints > iterate'
  , 'policy'
}
hints.use_nodata(true)
hints.config('/tmp/kresd/hints.tmp')
net.listen('0.0.0.0', 53, { kind = 'dns' })
net.listen('0.0.0.0',   853, { kind = 'tls' })
net.listen('::', 53, { kind = 'dns' })
net.listen('::',   853, { kind = 'tls' })
trust_anchors.remove('.')
trust_anchors.add_file('/etc/root.keys', true)
net.bufsize(1232)
net.ipv4=true
net.ipv6=true
cache.open(10*MB)
cache.clear()
table.insert(policy.special_names, { count = 0, cb = policy.all(
policy.FORWARD(
{'192.168.100.50@53'
,'fda6:7d51:ff03:0:7c5f:89ff:fea8:fa17@53'
}))})

--- Included custom configuration file from: ---
--- /etc/kresd/custom.conf
policy.add(policy.suffix(policy.PASS, {todname('home.arpa.')}))
cache.prefetch = false
user('kresd','kresd')

vcunat · July 9, 2024, 4:22am

There’s nothing like that in upstream defaults or in the config you pasted.

phoxx · September 11, 2024, 8:46pm

Haven’t read the whole thread but wanted to let you know what caused my Omnia go high on dhcp_host_domain_ng.py:

It was as simple as two Access Points (APs) that I had to reset after a electrical issue in the house and for some reason, they didn’t get their static IP assigned from the Omnia anymore. Instead I had to assign them another static IP and I thought I’d just do that through their management interface, instead of though LUCI.

The script exploded and caused my network to freeze on some devices. After removing the static IP assignment for the two APs in Turris (and a restart, to make sure) - everything is fine again.

So check if you have different IPs assigned on the client and the router - might lead to overload, especially if it’s an AP I guess…

vcunat · September 12, 2024, 5:17am

I don’t know if it’s related, but an AP shouldn’t assign any addresses. That’s router’s job. The DHCP protocol should just pass through the AP, with clients using it to get addresses from the router. (and the AP might also use DHCP to get an address from the router)

phoxx · September 12, 2024, 7:08am

No, the AP doesn’t assign addresses, that’s not what I mean.
The router runs the DHCP, each AP has an IP, and to make them somewhat manageable, I assigned static IPs to each AP.

When the electricity went off this Tuesday, I restarted everything, but my APs didn’t want to work with the controller anymore. I reset them and adopted them to the controller. In the same run I gave them static IPs in the controller software and forgot that I already had static IPs on the router. Those IPs didn’t match and the APs didn’t catch the right static IP from the router either…
That caused the script to go wild…

Example:
AP1 on device: 192.168.1.5
AP1 on router: 192.168.1.3

Might be the case for other people here as well.

I removed the static IP on the router for now and the load is back at 0,x

vcunat · September 12, 2024, 8:09am

Are you sure that the address range assigned dynamically by the DHCP does not contain these static addresses? I’m not confident that the DHCP server is smart enough…

phoxx · September 12, 2024, 8:24am

The issue is rather that the device demands 192.168.1.5 and the router wants to assign 192.168.1.3 while the device (AP) already got the 192.168.1.5 for some reason…
The DHCP doesn’t have to be smart for that, it’s just repeating to try to assign the “right” IP I guess…

Anyway, problem got solved that way and if anyone experiences the same it’s worth checking if all static IPs are actually correctly assigned.