Kresd crashes with policy configuration and FQDN as router name

Since today, kresd started crashing on startup for no reason:

2021-01-26 20:39:08 info kresd[8645]: [ta_update] refreshing TA for .
2021-01-26 20:39:08 info kresd[8645]: [ta_update] next refresh for . in 24 hours
2021-01-26 20:39:16 err kresd[8645]: Assertion failed: pkt && pkt->wire (../lib/utils.c: kr_pkt_make_auth_header: 320)

Configuration:

config resolver 'common'
        list interface '0.0.0.0'
        list interface '::0'
        option port '53'
        option keyfile '/etc/root.keys'
        option verbose '0'
        option msg_buffer_size '4096'
        option msg_cache_size '20M'
        option net_ipv6 '1'
        option net_ipv4 '1'
        option prefered_resolver 'kresd'
        option ignore_root_key '0'
        option prefetch 'yes'
        option dynamic_domains '1'
        option static_domains '1'
        option forward_custom '00_odvr-cznic'
        option forward_upstream '0'


config resolver 'kresd'
        option rundir '/tmp/kresd'
        option log_stderr '1'
        option log_stdout '1'
        option forks '1'
        option include_config '/etc/kresd/custom.conf'
        list rpz_file '/etc/kresd/adb_list.overall'
        option keep_cache '0'
        list hostname_config '/etc/hosts'

The only difference from before (“working” state) is that I set a FQDN for the network rather than “home” or “lan”.

The TurrisOS version this occurs in is 3.11.22.

It looks like something in my custom configuration that was previously working is now failing. Checking.

EDIT:

This breaks:

local ffi = require('ffi')
local function genRR (state, req)
        local answer = req.answer
        local qry = req:current()
        if qry.stype ~= kres.type.A then
                return state
        end
        ffi.C.kr_pkt_make_auth_header(answer)
        answer:rcode(kres.rcode.NOERROR)
        answer:begin(kres.section.ANSWER)
        answer:put(qry.sname, 900, answer:qclass(), kres.type.A, '\192\168\10\67')
        return kres.DONE
end

policy.add(policy.suffix(genRR, { todname('internal.xxx.net.') }))

@vcunat Does the domain being checked by the above require a valid DNSSEC entry? Because the domain registrar does not offer DNSSEC.

And if I add

if answer == nil then return nil end

before the first if, it does not crash, but the function does not work anymore.
EDIT: It looks that the policy is only applied if the record hits a NXDOMAIN. I assume this is the case because of the FQDN on the router.

No, this is about an API change in kresd 5.2. Just using req.answer isn’t supported anymore.

-        local answer = req.answer
+        local answer = req:ensure_answer()
+        if answer == nil then return nil end

(well, the second line most likely won’t ever be needed in your case, but it’s just cleaner)

EDIT: it’s perhaps unpleasant that ensure_answer didn’t exist until 5.2.0, so you need to apply the change at the right time (or make it all more complicated).

Thanks, your answer came right as I dug through the API docs. :wink:

Here’s the fixed version for posterity:

local function genRR (state, req)
        local answer = req:ensure_answer()
        local qry = req:current()
        if answer == nil then return nil end
        if qry.stype ~= kres.type.A then
                return state
        end
        ffi.C.kr_pkt_make_auth_header(answer)
        answer:rcode(kres.rcode.NOERROR)
        answer:begin(kres.section.ANSWER)
        answer:put(qry.sname, 900, answer:qclass(), kres.type.A, '\192\168\10\67')
        return kres.DONE
end

Thanks @vcunat!
Was this marked as deprecated before the update? This pulled the plug for us for several hours until I found this fix…

No, it wasn’t, I’m afraid. 5.2.0 removed the guarantee that answer packet always exists and at the same time added the ensure_answer helper for this. I didn’t realize we’ve suggested to put this into configuration on a couple places in this forum. It’s marked as incompatible change in 5.2.0 and its upgrade guide, but typically Turris users don’t need to care about that, so it wasn’t even linked from Turris NEWS.

If there is a better solution to what we tried to achieve in code, I would be all ears…

Actually, many use cases can be covered by API added in 5.1.0. The above example:

local genRR = policy.ANSWER({
	[kres.type.A] = { rdata=kres.str2ip('192.168.10.67'), ttl=900 },
}, true)
policy.add(policy.suffix(genRR, { todname('internal.xxx.net.') }))

(true is there to locally answer also other types as empty, in particular AAAA)
Docs: https://knot-resolver.readthedocs.io/en/stable/modules-policy.html#policy.ANSWER

1 Like

Nice! It’s much simpler and concise. I’ll adjust my configuration after reading the documentation. Does that work with single entries, or multiples, e.g. todnames({"foo.bar.baz", "baz.bar.baz"})?

policy.suffix does work with a table/list. And you want policy.todnames (i.e. the function is not in global scope).

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.