Resolver fails to resolv some DNS entries

I originally wanted to report this to knot resolver issue tracker, but according to that it’s a spam, so let’s try it here (as apparently knot developers are reading this forum as well):

Resolving se04.se.prima-vod-prep-sec.service.cdn.cra.cz fails using kresd (on Turris Omnia):

; <<>> DiG 9.9.8-P4 <<>> se04.se.prima-vod-prep-sec.service.cdn.cra.cz
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 54055
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;se04.se.prima-vod-prep-sec.service.cdn.cra.cz. IN A

;; Query time: 8 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Jan 20 17:10:23 CET 2017
;; MSG SIZE  rcvd: 74

The unbound resolves this just fine:

; <<>> DiG 9.9.5-9+deb8u9-Debian <<>> se04.se.prima-vod-prep-sec.service.cdn.cra.cz
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62671
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;se04.se.prima-vod-prep-sec.service.cdn.cra.cz. IN A

;; ANSWER SECTION:
se04.se.prima-vod-prep-sec.service.cdn.cra.cz. 60 IN A 84.244.95.23

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Jan 20 17:08:05 CET 2017
;; MSG SIZE  rcvd: 90

It might be that they are doing something non standard, but the resolver still should be able to deal with it (especially if others do).

Are you sure you have forwarding disabled? I can’t reproduce this problem with knot-resolver-1.1.1.

It is disabled in the Omnia settings, so I assume I have.

Anyway here is kresd.config:

modules = {
    'hints'
  , 'policy'
  , 'stats'
  , predict = {
        window = 30 -- 30 minutes sampling window
      , period = 24*(60/30) -- track last 24 hours
  }
}
hints.config('/etc/hosts')
net.bufsize(4096)
net.ipv4=true
net.ipv6=true
cache.open(20*MB)
cache.clear()

Actually the failure probably depends on which DNS server it hits. Here is verbose log from the failure:

[plan] plan 'se03.se.prima-vod-prep-sec.service.cdn.cra.cz.' type 'A'
[resl]   => NS is provably without DS, going insecure
[plan]   plan 'sr02.cdn.cra.cz.' type 'AAAA'
[resl]     => NS is provably without DS, going insecure
[plan]     plan 'ns2.bluetone.cz.' type 'AAAA'
[resl]       => querying: '2001:678:11::1' score: 10 zone cut: 'cz.' m12n: 'bLuEtoNE.Cz.' type: 'NS' proto: 'udp'
[iter]       <= using glue for 'ns.bluetone.cz.'
[iter]       <= using glue for 'ns1.bluetone.cz.'
[iter]       <= using glue for 'ns2.bluetone.cz.'
[iter]       <= referral response, follow
[vldr]       <= DS doesn't exist, going insecure
[vldr]       <= answer valid, OK
[resl]       <= server: '2001:678:11::1' rtt: 25 ms
[resl]       => querying: '2a02:a40:2::13' score: 10 zone cut: 'bluetone.cz.' m12n: 'nS2.bLuetONe.CZ.' type: 'AAAA' proto: 'udp'
[iter]       <= using glue for 'ns1.bluetone.cz.'
[iter]       <= using glue for 'ns.bluetone.cz.'
[iter]       <= using glue for 'ns2.bluetone.cz.'
[iter]       <= rcode: NOERROR
[resl]       <= server: '2a02:a40:2::13' rtt: 3 ms
[resl]     => querying: '2a02:a40:2:200::2:200' score: 11 zone cut: 'cdn.cra.cz.' m12n: 'sR02.CdN.cRA.cz.' type: 'AAAA' proto: 'udp'
[iter]     <= rcode: NOERROR
[ pc ]     => answer cached for TTL=900
[resl]     <= server: '2a02:a40:2:200::2:200' rtt: 5 ms
[plan]   plan 'sr02.cdn.cra.cz.' type 'A'
[resl]     => NS is provably without DS, going insecure
[resl]     => querying: '2a02:a40:2::13' score: 11 zone cut: 'cdn.cra.cz.' m12n: 'SR02.CdN.cRa.CZ.' type: 'A' proto: 'udp'
[iter]     <= rcode: NOERROR
[resl]     <= server: '2a02:a40:2::13' rtt: 3 ms
[resl]   => querying: '84.244.72.20' score: 10 zone cut: 'service.cdn.cra.cz.' m12n: 'pRiMa-vOD-PrEp-SEc.sErVICE.cdn.CrA.Cz.' type: 'NS' proto: 'udp'
[iter]   <= malformed response
[resl]   => querying: '84.244.72.20' score: 10 zone cut: 'service.cdn.cra.cz.' m12n: 'prima-vod-prep-sec.service.cdn.cra.cz.' type: 'NS' proto: 'udp'
[iter]   <= rcode: NOTIMPL
[resl]   => server: '84.244.72.20' flagged as 'bad'
[plan]   plan 'sr01.cdn.cra.cz.' type 'AAAA'
[resl]     => NS is provably without DS, going insecure
[resl]     => querying: '2a02:a40:2::12' score: 10 zone cut: 'cdn.cra.cz.' m12n: 'sR01.CdN.Cra.cz.' type: 'AAAA' proto: 'udp'
[iter]     <= rcode: NOERROR
[ pc ]     => answer cached for TTL=900
[resl]     <= server: '2a02:a40:2::12' rtt: 3 ms
[plan]   plan 'sr01.cdn.cra.cz.' type 'A'
[resl]     => NS is provably without DS, going insecure
[resl]     => querying: '2a02:a40:2:200::2:200' score: 11 zone cut: 'cdn.cra.cz.' m12n: 'sr01.cdn.cRA.Cz.' type: 'A' proto: 'udp'
[iter]     <= rcode: NOERROR
[resl]     <= server: '2a02:a40:2:200::2:200' rtt: 4 ms
[resl]   => querying: '82.99.164.132' score: 11 zone cut: 'service.cdn.cra.cz.' m12n: 'prima-vod-prep-sec.service.cdn.cra.cz.' type: 'NS' proto: 'udp'
[iter]   <= rcode: NOTIMPL
[resl]   => server: '82.99.164.132' flagged as 'bad'
[resl]   => unresolvable NS address, bailing out
[resl]   => no valid NS left
[resl] finished: 8, queries: 6, mempool: 32784 B

Strange enough unbound is able to consistenly resolve these records (it’s at least 4 or them, just use 01-04 as number), but kresd usually fails.

1 Like

Great log! (Though horribly highlighted here.) I had tried several times and never got a SERVFAIL as final result. We will look into this later.

The problem here is that both their nameservers answer prima-vod-prep-sec.service.cdn.cra.cz. NS query by NOTIMPL (!) which is rather bad, especially with respect to qname minimization. Some resolvers apparently do manage to work around that reliably; in this specific case knot-resolver sometimes fails.

For reference, knot-resolver now works around this (upstream master).