Knot-resolver doesn't work with hosts that have only AAAA record

@vcunat @miska I have been trying to figure this out since Turris 3.10 and I think I know now what is happening.

The problem I was seeing was that

dig aaaa aaaa.v6ns.test-ipv6.com

would return SERVFAIL after the router was rebooted. v6ns.test-ipv6.com is a host that only has a AAAA record, and no A record. If kresd was restarted after boot time, the same dig command would return the correct AAAA record.

Looking at the logs, I saw this output for kresd:

2018-07-04 00:39:40 info kresd[4272]: > net.ipv6 = false

This seem to come from this portion of /etc/init.d/kresd, which seems to be some code designed to see if there’s working IPv6:

( sleep 15 # Wait for resolver to start working and system to boot up
  if ! ip -6 r s | grep -q '^default' &&\
     ping -c 1 api.turris.cz > /dev/null 2>&1 && \
     ! ping -6 -c 1 api.turris.cz > /dev/null 2>&1; then
	echo "net.ipv6 = false" | socat - UNIX-CONNECT:$(sleep 5; ls -1 $DEFAULT_RUNDIR/tty/*) > /dev/null 2>&1
  fi) &

I think that the sleep 15 here is the culprit, at least for me, 15 seconds is not enough for the router and resolver to come up with IPv6 available. I adjusted this to 30 seconds; the net.ipv6 = false message disappeared from the logs and I no longer had the issue with dig and AAAA records.

Could you change the default timing here to be longer, or figure out if there’s a more reliable method than sleep?

1 Like

I wonder if this might better be a recurrent service – once every few minutes repeat the test (perhaps with fixed IPs instead of names) and then set net.ipv6 accordingly (i.e. if v4_works, set it to value of v6_works).

For the reference link to gitlab issue https://gitlab.labs.nic.cz/turris/openwrt/issues/200

Thanks @vcunat. Agree that lengthening the wait time would be an ok fix initially. A periodic check for IPv6 connectivity is a good idea for a longer term fix, including logging when said check fails would definitely help folks with IPv6 issues.