DNS (kresd): Delegating a sub-domain of .lan to an external resolver

Hi,

I’ve recently migrated a moderately complex existing network from an ancient modded ASUS router to an Omnia with Turris OS 4.x. As part of that work, I needed to delegate a sub-domain of .lan to an external resolver, and I wanted to do that in way that is as robust as possible, i.e. least likely to break in the presence of future TurrisOS updates.

The existing documentation (mainly here and here) is a bit sparse on this, so I’m sharing my setup in the hope that it will be useful to others, and perhaps we can use it to improve the “official” documentation.

TL;DR: This is essentially a work-around for the lack of a policy.add_front() or similar API in knot-resolver. Full explanation follows.

After some experimentation using the “live” kresd console, I ended up with the following in /etc/kresd/custom.conf:

-- Custom configuration starts here
local function copyPolicyRules(r)
  r1 = {}
  for k, v in pairs(r) do
    table.insert(r1, v.cb)
  end
  return r1
end

local function addPolicyRules(r)
  for k, v in pairs(r) do
    policy.add(v)
  end
end

-- Our configuration file gets included AFTER the Turris-generated one, but we
-- need to add our policy rules BEFORE any added by the Turris-generated config.
-- Take a copy of the complied rule data, i.e. policy.rules[].cb.
local rulesCopy = copyPolicyRules(policy.rules)
-- Drop all existing policy rules.
policy.rules = {}

-- Add our cystom rules:

-- Add an uncached STUB zone, forwarding queries for 'ci.lan' to
-- the dnsmasq instance running on 10.70.22.2.
local customStubZones = policy.todnames({'ci.lan'})
policy.add(policy.suffix(policy.FLAGS({'NO_CACHE'}),  customStubZones))
policy.add(policy.suffix(policy.STUB({'10.70.22.2'}), customStubZones))

-- Add the previously saved copy of the system-generated rules.
addPolicyRules(rulesCopy)

To actually enable loading the custom.conf, run:

uci set resolver.kresd.include_config=/etc/kresd/custom.conf
uci commit
service kresd restart

Note that the policy.FLAGS({'NO_CACHE'} rule is not required, but makes debugging of the setup easier. In my case I actually want the uncached behaviour, as that DNS zone has ephemeral machines coming and going in it and I would like clients to react to those changes immediately.

Technical explanation

What I’m trying to do here is essentially replace part of the DNS tree, as documented in the knot-resolver documentation here. However, the mechanism for including a custom kresd configuration in Turris OS always includes the configuration after any system-generated configuration.

Such system-generated configuration may contain catch-all FORWARD rules, or in fact anything else. Given that policy is evaluated in order, we need to ensure that our custom rules are evaluated before any catch-all rules installed by Turris OS. We do that by taking a copy of the existing policy.rules[...].cb (the compiled callback for each rule), dropping all of policy.rules[] and then adding what we need in the right order, with the system-generated rules last.

Questions

These are mostly for Turris OS developers / knot-resolver experts (ping @Pepe, @vcunat):

  1. Do you anticipate any problems with this approach in the future? As stated, my goal is to ideally not have it break during updates.
  2. I’m not a Lua expert, if someone could confirm that the above copy-based approach does not produce any unexpected side-effects, that’d be great.
  3. I noticed that the Forris “Configuration backup” .tar.bz2 only includes some subset of files in /etc. It’d be useful to have a well-defined place for custom configuration files such as /etc/custom, to ensure that these get backed up and restored(!) with all of the system configuration.
  4. (minor) Can we get a rlwrap package or similar tool? It’d make live coding against the kresd control socket so much easier… I’d also suggest installing a kresd-console script encapsulating the socat invocations.
  5. (minor) service kresd restart produces the following errors on the console on my system. These appear to be unrelated to the presence of any custom.conf:
    syntax error. Last token seen: +
    Garbled time
    

Related

1 Like

policy.rules

We are avoiding generating anything into policy.rules, exactly because of what you’re dealing with. It’s a little hacky now, with forwarding rules being put into policy.special_names, but I believe it should work well.

Upstream (myself) has plans for the following months to try designing better configuration model/API for policy-related stuff, so over longer term I expect the experience to improve. Until then I believe we do want to keep the current approach of never generating anything into policy.rules (Q1).

(Q2: I would do the table-copying differently because of policy.add side-effects and order-preservation, but that seems no longer relevant.)

Caching

Without it you may run into subtle problems, as explained in the upstream docs you linked. If someone tries to resolve e.g. lan1 name, kresd won’t use this rule and will obtain records proving that a range of names around lan1 does not exist (including lan itself). Then when a positive foo.lan record isn’t found in cache, this will be available to generate a negative reply without asking upstream.

You can view it as a consequence of having just a single shared cache… or the config API not well expressing what people typically want.

rlwrap

Upstream has a kresc prototype, including history and tab-completion. It’s buggy, unsupported and abandoned. We mainly try to avoid non-experts having to use the socket directly – provide better UI (re)Foris for most common stuff, perhaps some “howtos” for less common stuff.

others

  • Q3 is outside my expertise; perhaps someone else can help.
  • Q5: hmm, I’m not sure. I thought those lines had been fixed by someone already; I can’t really remember. I’m not getting them in 5.0.0 (current :turtle:HBT).
1 Like

I originally started out with Forris set to use the upstream ISP nameservers. In that case, the FORWARD rule gets added to policy.rules, not policy.special_names, which is why I started with this approach. Since then (upstream has broken IPv6, as usual) I’ve switched Forris to use the CZ.NIC resolvers, and that does use policy.special_names.

Can you elaborate on that? It still seems relevant for people who are using upstream DNS.

My system appears to be on hbs (is there any other way to check apart from running switch-branch and ^C -ing it?)

FWIW relevant bits from /etc/os.release are:

NAME="TurrisOS"
VERSION="4.0.5"
BUILD_ID="ab9d1bf"

You can modify your /etc/config/backups and add backup files that you want. I’m not sure if this is what we should add it by default, but we will discuss it further.

This is partially fixed in Turris OS 5.0 (currently in RC) as @vcunat pointed out. There are going to be more fixes, which get rids off of more warnings/errors in Knot Resolver and resolver-conf. These are on review, currently and it should be part of Turris OS 5.0 RC2.

1 Like

Thanks for that.

Just a thought about the big picture of custom configuration:

One advantage of having an explicitly “blessed” location for custom configuration files would be that the presence of files there could serve as a “hint” for the updater that customizations are in place. This information could then be used to e.g. discourage potentially incompatible updates, or explicitly require approval regardless of the user’s settings.

But yea, balancing between allowing customization and still having a sanely supportable whole is a tricky question.

:heart: I consider that a bug on our side; I tested a fix locally already.

Anyway, for inserting into the table I’d use:

local rules_gen = policy.rules
policy.rules = {}

policy.add(...)

for _, v in ipairs(rules_gen) do -- ipairs preserves order
    table.insert(policy.rules, v)
end

In (re)Foris there is the “about” tab that shows it, but I’m not sure if that feature is in 4.0.5 already (I re-checked in 5.0.0).

1 Like

This is included since Turris OS 4.0.2.

2 Likes

For reference, merged as knot-resolver: use the policy hack consistently (ca13263a) · Commits · Turris / Turris OS / Turris OS packages · GitLab (note that you won’t see this fix in Turris OS 4.x, though)

1 Like

@vcunat: Just for my understanding of what you did there: The bug was that you were inserting the result of policy.add(rule) rather than rule itself, and as a side-effect of the policy.add() the rule got added into policy.rules, so it kind of worked.

Am I right?

No. There’s the regular policy.rules list that gets manipulated by policy.add. To avoid the user-configuration issues, we switched generated FORWARD rules to using a different list (policy.special_names)… but somehow we forgot to do the switch in all cases, and this commit fixes that.

Got it. I misread the diff and missed the addition of the outer table.insert().

Thanks for the explanation.