Kresd stopped resolving some names suddenly

Mostly everything is fine … am surfing the web fine and nothing seems untoward. But I noticed a site kresd would not resolve, but the upstream server does. So tried some diagnostics:

Now I notice it on another site out of the blue. Let’s look at the second:

basic domain name is fine:

# dig eyebuydirect.com

; <<>> DiG 9.11.19 <<>> eyebuydirect.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36112
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;eyebuydirect.com.		IN	A

;; ANSWER SECTION:
eyebuydirect.com.	86400	IN	A	107.154.105.49
eyebuydirect.com.	86400	IN	A	107.154.106.49

;; Query time: 26 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Mar 10 21:42:11 AEDT 2021
;; MSG SIZE  rcvd: 77

But a http request tot hat redirects to the www. subdomain and:

# dig www.eyebuydirect.com

; <<>> DiG 9.11.19 <<>> www.eyebuydirect.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 34484
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.eyebuydirect.com.		IN	A

;; Query time: 2 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Mar 10 21:42:36 AEDT 2021
;; MSG SIZE  rcvd: 49

SERVFAIL!

To find my upstream DNS I can run ubus call network.interface.wan status and see:

"dns-server": [
		"203.12.160.35",
		"203.12.160.36"
	],

and so:

# dig www.eyebuydirect.com @203.12.160.35

; <<>> DiG 9.11.19 <<>> www.eyebuydirect.com @203.12.160.35
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32030
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.eyebuydirect.com.		IN	A

;; ANSWER SECTION:
www.eyebuydirect.com.	3017	IN	CNAME	vykgy.x.incapdns.net.
vykgy.x.incapdns.net.	30	IN	A	107.154.128.49

;; Query time: 15 msec
;; SERVER: 203.12.160.35#53(203.12.160.35)
;; WHEN: Wed Mar 10 21:50:48 AEDT 2021
;; MSG SIZE  rcvd: 99

The upstream server resolves it fine. What is kresd doing suddenly. Only yesterday it was forwarding such requests to the upstream servers. OK so I clear DNS cache, by restarting kresd: `/etc/init.d/resolver restart’ and no change, still SERVFAIL.

This impacts at lease: www.eyebuydirect.com and www.spiceworks.com currently but nothing else I’ve been using … it seems very targetted.

Their DNSSEC has expired: https://dnsviz.net/d/www.eyebuydirect.com/YEil6g/dnssec/

Well, this case will work with clear cache, as they downgraded it to insecure status in the meantime, but even my local cache still has DS that has not expired TTL but they now can’t serve any good DNSKEY to match it.

Thanks for the insight. It seems the other example suffers the same problem:

https://dnsviz.net/d/www.spiceworks.com/dnssec/

How widespread is this problem? Two sites is not a crisis yet, but still a bother to what I was trying to do. Any way to temporarily ignore DNSSEC for specific (poorly maintained but otherwise desired to access sites)?

And as an aside, what clue did you see to suspect that? or was it a wild guess? In short Im keen to learn how to diagnose such glitches myself more readily.

I often use dnsviz.net to look if it reports some errors, during quick triage.

In this case I’d just clear the cache. Restarts do that; otherwise e.g. over ssh:

echo 'cache.clear()' | socat - /tmp/kresd/control/*

Cool, but is there a way to get kresd to report the reason for failure more clearly than dig’s status: SERVFAIL, id: 34484? Or does that code suggest a DNSSEC failure?

In future we plan to implement a standard that shows at least something. For example, CouldFlare have it already:

root@turris:~# dig rhybar.cz @1.1.1.1

; <<>> DiG 9.16.12 <<>> rhybar.cz @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 51168
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; EDE: 6 (DNSSEC Bogus)
;; QUESTION SECTION:
;rhybar.cz.                     IN      A

;; Query time: 40 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Wed Mar 10 12:13:32 CET 2021
;; MSG SIZE  rcvd: 44

EDIT: but to most people the “DNSSEC Bogus” string won’t be sufficient explanation anyway, I think.

Agreed. dig has hellishly cryptic output anyhow. A trace of the request and response would be so nice to coax out of it and/or kresd. Hence:

Maybe this was a relatively big incident: https://lists.dns-oarc.net/pipermail/dns-operations/2021-March/021032.html