Unbound returning SERVFAIL after moving to a new location

I have moved to a new location with a new provider. I plugged in Turris and everything works fine except Unbound, which started returning SERVFAIL and I can’t discern why.

I’ll happily provide any necessary information, but I am at loss why it does that. When I change DNS manually in resolv.conf, resolving works as it should. Trying to setup DoT is not helpful.

Here is the output of resolver-debug

 /etc/resolver/resolver-debug.sh start
Start debug
== enable verbose logging (reboot to disable it) ==
/usr/sbin/unbound-control
0
ok
resolver.common=resolver
resolver.common.interface='0.0.0.0' '::0'
resolver.common.port='53'
resolver.common.keyfile='/etc/root.keys'
resolver.common.verbose='0'
resolver.common.msg_buffer_size='4096'
resolver.common.msg_cache_size='20M'
resolver.common.net_ipv6='1'
resolver.common.net_ipv4='1'
resolver.common.prefered_resolver='unbound'
resolver.common.ignore_root_key='0'
resolver.common.static_domains='1'
resolver.common.dynamic_domains='1'
resolver.common.forward_custom='99_cloudflare'
resolver.common.forward_upstream='0'
resolver.kresd=resolver
resolver.kresd.rundir='/tmp/kresd'
resolver.kresd.log_stderr='0'
resolver.kresd.log_stdout='0'
resolver.kresd.forks='1'
resolver.unbound=resolver
resolver.unbound.outgoing_range='60'
resolver.unbound.outgoing_num_tcp='1'
resolver.unbound.incoming_num_tcp='1'
resolver.unbound.msg_cache_slabs='1'
resolver.unbound.num_queries_per_thread='30'
resolver.unbound.rrset_cache_size='100K'
resolver.unbound.rrset_cache_slabs='1'
resolver.unbound.infra_cache_slabs='1'
resolver.unbound.infra_cache_numhosts='200'
resolver.unbound.access_control='0.0.0.0/0 allow' '::0/0 allow'
resolver.unbound.pidfile='/var/run/unbound.pid'
resolver.unbound.root_hints='/etc/unbound/named.cache'
resolver.unbound.target_fetch_policy='2 1 0 0 0'
resolver.unbound.harden_short_bufsize='yes'
resolver.unbound.harden_large_queries='yes'
resolver.unbound.qname_minimisation='yes'
resolver.unbound.harden_below_nxdomain='yes'
resolver.unbound.key_cache_size='100k'
resolver.unbound.key_cache_slabs='1'
resolver.unbound.neg_cache_size='10k'
resolver.unbound.prefetch='yes'
resolver.unbound.prefetch_key='yes'
resolver.unbound_remote_control=resolver
resolver.unbound_remote_control.control_enable='yes'
resolver.unbound_remote_control.control_use_cert='no'
resolver.unbound_remote_control.control_interface='127.0.0.1'
== resolv.conf* ==
/etc/resolv.conf:search lan
/etc/resolv.conf:nameserver 127.0.0.1
/tmp/resolv.conf:search lan
/tmp/resolv.conf:nameserver 127.0.0.1
/tmp/resolv.conf.auto:# Interface wan
== DNSSEC root key file ==
cb02e46d912d6e4ab17dbc8289f4d14b  /etc/root.keys
/etc/root.keys:. IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
. IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
== resolver process ==
TBD
== resolution attempts ==

; <<>> DiG 9.16.31 <<>> @127.0.0.1 +dnssec repo.turris.cz
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 47575
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;repo.turris.cz.			IN	A

;; Query time: 876 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Aug 28 21:39:40 CEST 2022
;; MSG SIZE  rcvd: 43


; <<>> DiG 9.16.31 <<>> @127.0.0.1 +dnssec www.google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 37374
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;www.google.com.			IN	A

;; Query time: 1672 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Aug 28 21:39:42 CEST 2022
;; MSG SIZE  rcvd: 43


; <<>> DiG 9.16.31 <<>> @127.0.0.1 +dnssec www.facebook.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 58906
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;www.facebook.com.		IN	A

;; Query time: 1712 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Aug 28 21:39:43 CEST 2022
;; MSG SIZE  rcvd: 45


; <<>> DiG 9.16.31 <<>> @127.0.0.1 +dnssec www.youtube.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 40408
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;www.youtube.com.		IN	A

;; Query time: 1656 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Aug 28 21:39:45 CEST 2022
;; MSG SIZE  rcvd: 44


; <<>> DiG 9.16.31 <<>> @127.0.0.1 +dnssec www.rhybar.cz
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 7406
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;www.rhybar.cz.			IN	A

;; Query time: 880 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Aug 28 21:39:46 CEST 2022
;; MSG SIZE  rcvd: 42


; <<>> DiG 9.16.31 <<>> @127.0.0.1 +dnssec *.wilda.rhybar.0skar.cz
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 43486
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;*.wilda.rhybar.0skar.cz.	IN	A

;; Query time: 879 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Aug 28 21:39:47 CEST 2022
;; MSG SIZE  rcvd: 52


; <<>> DiG 9.16.31 <<>> @127.0.0.1 +dnssec *.wilda.nsec.0skar.cz
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 19662
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;*.wilda.nsec.0skar.cz.		IN	A

;; Query time: 875 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Aug 28 21:39:48 CEST 2022
;; MSG SIZE  rcvd: 50


; <<>> DiG 9.16.31 <<>> @127.0.0.1 +dnssec *.wild.nsec.0skar.cz
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 38153
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;*.wild.nsec.0skar.cz.		IN	A

;; Query time: 879 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Aug 28 21:39:49 CEST 2022
;; MSG SIZE  rcvd: 49


; <<>> DiG 9.16.31 <<>> @127.0.0.1 +dnssec *.wilda.0skar.cz
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 56770
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;*.wilda.0skar.cz.		IN	A

;; Query time: 839 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Aug 28 21:39:50 CEST 2022
;; MSG SIZE  rcvd: 45


; <<>> DiG 9.16.31 <<>> @127.0.0.1 +dnssec *.wild.0skar.cz
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 15840
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;*.wild.0skar.cz.		IN	A

;; Query time: 875 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Aug 28 21:39:51 CEST 2022
;; MSG SIZE  rcvd: 44


; <<>> DiG 9.16.31 <<>> @127.0.0.1 +dnssec www.wilda.nsec.0skar.cz
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 64065
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;www.wilda.nsec.0skar.cz.	IN	A

;; Query time: 871 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Aug 28 21:39:52 CEST 2022
;; MSG SIZE  rcvd: 52


; <<>> DiG 9.16.31 <<>> @127.0.0.1 +dnssec www.wilda.0skar.cz
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 32431
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;www.wilda.0skar.cz.		IN	A

;; Query time: 871 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Aug 28 21:39:53 CEST 2022
;; MSG SIZE  rcvd: 47


; <<>> DiG 9.16.31 <<>> @127.0.0.1 +dnssec *.wilda.rhybar.ecdsa.0skar.cz
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 26170
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;*.wilda.rhybar.ecdsa.0skar.cz.	IN	A

;; Query time: 879 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Aug 28 21:39:54 CEST 2022
;; MSG SIZE  rcvd: 58

Can you verify you have set the correct time on your Turris? I bet the battery backing up the RTC is dead so your device does not have current time.

I vaguely remember that current NTP setup does not depend on DNS (for this exact reason), so I’d hope that any time shift should self-correct soon. But…

@hagrid That is a very good tip, but unfortunately it is not it. Time is (at least now, after 12 hours of running, correct)

Resolving does work with external servers, but does not work with local Unbound

root@turris:~# dig facebook.com @127.0.0.1

; <<>> DiG 9.16.31 <<>> facebook.com @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 37828
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;facebook.com.			IN	A

;; Query time: 844 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Aug 29 13:21:06 CEST 2022
;; MSG SIZE  rcvd: 41

root@turris:~# dig facebook.com @8.8.8.8


 <<>> DiG 9.16.31 <<>> facebook.com @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 727
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;facebook.com.			IN	A

;; ANSWER SECTION:
facebook.com.		126	IN	A	157.240.30.35

;; Query time: 12 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Mon Aug 29 13:20:52 CEST 2022
;; MSG SIZE  rcvd: 57

root@turris:~# nslookup facebook.com
;; connection timed out; no servers could be reached

This is Turris 1.x, I am running TurrisOS 5.4.1 83b0e20711ee4a927634b3c2a018c93527e84a2b / LuCI branch git-22.115.68448-712bc8e

I am running adblock, if that is of any concern…

Is there any change when you enable the forwarding?
image

I am at a complete loss. I have fiddled with these settings now for about five minutes. Sometimes DNS started working, but only for about 5-10 queries and then it returned to SERVFAIL. Once it was with DoT to CZ.NIC (but with disabled DNSSEC), later with enabled DNSSEC, then with something else random, but it was always broken in about a minute. Not sure how to debug that…

So, after some Googling I tried messing with unbound-anchor and found out this. Not sure, if that is the cause, but it definitely does not seem good:

root@turris:~# unbound-anchor -vvvvvv
/etc/unbound/root.key has content
fail: the anchor is NOT ok and could not be fixed

What content is in that file?

I’d look into logs. That’s generally an advisable step.

As a last resort you can issue the pkgupdate --reinstall-all to reinstall all packages to see if it fixes your issue.

Hi, I tried to debug this on my own. I did run unbound manually, with verbosity 3 and logged everything into a file. It seems that unbound is timeouting when trying to contact root servers.

[1661979646] unbound[9949:0] info: resolving jlk.cz. A IN
[1661979646] unbound[9949:0] debug: request has dependency depth of 0
[1661979646] unbound[9949:0] info: priming . IN NS
[1661979646] unbound[9949:0] debug: mesh_run: iterator module exit state is module_wait_subquery
[1661979646] unbound[9949:0] info: mesh_run: end 9 recursion states (8 with reply, 0 detached), 8 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out
[1661979646] unbound[9949:0] info: 0pvCD mod2  . NS IN
[1661979646] unbound[9949:0] info: 1RDdc mod2 rep jlk.cz. A IN
[1661979646] unbound[9949:0] info: 2RDdc mod2 rep sentinel.turris.cz. AAAA IN
[1661979646] unbound[9949:0] info: 3RDdc mod2 rep sentinel.turris.cz. AAAA IN
[1661979646] unbound[9949:0] info: 4RDdc mod2 rep sentinel.turris.cz. A IN
[1661979646] unbound[9949:0] info: 5RDdc mod2 rep sentinel.turris.cz. AAAA IN
[1661979646] unbound[9949:0] info: 6RDdc mod2 rep sentinel.turris.cz. A IN
[1661979646] unbound[9949:0] info: 7RDdc mod2 rep sentinel.turris.cz. A IN
[1661979646] unbound[9949:0] info: 8RDdc mod2 cb . DNSKEY IN
[1661979646] unbound[9949:0] debug: cache memory msg=8272 rrset=8272 infra=4166 val=8392 subnet=16580
[1661979646] unbound[9949:0] debug: timeout udp
[1661979646] unbound[9949:0] debug: svcd callbacks start
[1661979646] unbound[9949:0] debug: worker svcd callback for qstate 0x8ee678

Full log is here: http://upload.jlk.cz/unbound2.log

This is consistent with manual digging over SSH, which (as shown) works, when I use some external DNS server and not my unbound

root@turris:~# dig a.root-servers.net @1.1.1.1

; <<>> DiG 9.16.31 <<>> a.root-servers.net @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5242
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;a.root-servers.net.		IN	A

;; ANSWER SECTION:
a.root-servers.net.	3599969	IN	A	198.41.0.4

;; Query time: 8 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Wed Aug 31 23:10:24 CEST 2022
;; MSG SIZE  rcvd: 63

root@turris:~# dig jlk.cz @198.41.0.4 +norec

; <<>> DiG 9.16.31 <<>> jlk.cz @198.41.0.4 +norec
;; global options: +cmd
;; connection timed out; no servers could be reached

Ewww, I can’t see why a packet to/from 198.41.0.4 would be lost while essentially the same one to/from 1.1.1.1 passes through. I think it’s most likely the ISP’s “fault”. If I had the option, I’d try dig (+ ping, etc.) to various addresses and also from LAN devices, possibly even when using a different router. And then ask the ISP with the results, assuming change of the router wouldn’t help (just to be sure), etc.

Hi, as @vcunat mentioned, it was indeed a problem with ISP, who blocked access to root DNS servers.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.