DNS-over-TCP: Just a single transaction?

I missed that in my own log. Great finding! AVM answers with the capitalization of the last cached query. To double-check that, I went for

socat - /tmp/kresd/tty/*
> option('NO_0X20', true)

on both on my Turris MOX and my Ubuntu 19.10. That disables the query-case randomization in the Knot Resolver. I waited half an hour (1800 seconds) until both the CNAME and the A record were invalidated in the DNS cache, and issued

dig repo.turris.cz

on both devices. Both stayed with UDP and got an IP address. There we go! Fixed.

This raises four questions:

  1. How do I set that option in my Turris MOX not just temporarily but permanently?
  2. Why did my Turris MOX not cache the first result but creates a new query each time?
  3. Why does this affect only CNAME + A(AAA) answers?
  4. Is there anything AVM, the Knot Resolver Team, or the Turris Team should do about this? I noticed, there is not only this NO_0X20 but also a broader SAFEMODE option. Should the Turris Team go for that, just for the “Use provider’s DNS resolver” option? Or couple (and mention) it with the DNSSEC option?

Furthermore, I think I found the full-resolver issue as well. When I go for

sudo service kresd@1 restart

Knot Resolver does a full resolve although /etc/knot-resolver/kresd.conf said to use stub/forwarding (on my computer, I enabled verbose via the config file). Finally, while reading the documentation, I found some broken hyperlinks in the section trace. Where do I report things like this?

Finally, finally, the documentation of the module workarounds should link to its source code to give a better understanding. From reading the documentation, I thought that module does NO_0X20 for all, but it only does it for some domains. Perhaps the documentation could mention even that option while explaining the module: If you have to turn-off this randomization for all domains, go for …

Yes, the randomization can be disabled. For communication within a trusted/firewalled network it actually sounds OK to me, on a quick thought at least.

Same as with any kresd setting; see this wiki entry :Turris Documentation

I have no idea about that part so far. (But it’s nothing “wrong” in any case.)

Why do you think it only affects those? I haven’t seen evidence of that yet. (Well, most people don’t really need other records, I think.)

A colleague plans to contact AVM about this. (Maybe he’s written them already.) On our side, I think support just tries to steer people from forwarding to ISP whenever DNS problems are encountered.

Unfortunately, some services sold by ISPs along with connection in practice rely on you using their DNS, which is the main reason why it’s the default :frowning:

In any case, let me also /cc @fasteth about this FRITZ! problem. (TL;DR: forwarding DNS to FRITZ! is causing problems.)

We’ll need more details, but let’s move that upstream. You can choose from a few contact options on: Development – Knot Resolver

Upstream, but it would probably be me making the fix anyway, so I did just that: doc: fix a broken internal link (!958) · Merge requests · Knot projects / Knot Resolver · GitLab

Perhaps, we’ll discuss it internally.

That particular module is a kind of failed dead end in my opinion. The issue is that we would be heaping more hacks because of broken implementations/services, and as a result we would be making the experience worse for good implementations/services (because of the extra maintenance, bugs, etc.). If they are broken and it’s not problem-free to work around on our side, it just won’t work, shifting the burden more towards those who create it.

For example, disabling case randomization worsens security properties in large fraction of cases (as most names still are not signed by DNSSEC). It would make it easier for off-path attackers (i.e. anyone on the internet) to poison your DNS cache, instead of being vulnerable mainly to on-path attackers (e.g. your ISP).

Great, works. Would be cool if that is explained within the file /tmp/kresd.config instead just saying that this file should not be edited because it was automatically generated, for example a link to that wiki. That is something for the Turris Team, right?

Shouldn’t the Knot Resolver cache the answer until one of its records expired? The one on Ubuntu does this. I would have expected the same from my Turris MOX.

The plus is the magic. I did not face this issue for just A records (or just AAAA records). I face this issue only for domains which return CNAME plus A(AAA). However, if you do not know a cause, this might be an AVM issue. I will triple-check.

Let us wait for someone of the Turris Team.

By the way, I really love the naming. So often in other products, and here again, in a security-aware software, SAFE does not mean security but compatibility. Isn’t it nice.

You can reproduce this by enabling the verbose mode globally, set STUB or FORWARD, and then restart Knot Resolver. It is going to ask the root servers, at least here. I am not sure whether this is needed and even who causes this (a configuration, missing configuration, …) The problem is, I do not know whether this is a Knot Resolver issue at all.

Do not make a fuss about it. I am not against that module. I am not against its implementation. I simply lost half an hour of my life time because I misunderstood the documentation. I tried to enable that module as I thought it would disable query-case randomization globally. If I would have known that this module are just workarounds for some specific domains, that would have helped me not to waste my time enabling it. Actually, a link to the source code would have pointed me to the option NO_0X20 (and SAFEMODE). Therefore, my idea to mention those two options in the documentation already, for those who cannot read Lua.

For example, you mentioned ‘QNAME case randomization’. First, I had to understand what that is. Then, I had to find what relates to that in the documentation (or source code). By that I found ‘workarounds’. Then, I had to play around with that component. Then I had to figure out whether it is set and/or what it does. Nothing. Then I found ‘NO_0X20’. Then, I had to play around with that component. Then I had to figure out whether it is set and/or what it does. And I had to go for the PDF instead of the HTML documentation because ‘NO_0X20’ as search term does not work.

I am just about the documentation. You do not want to invest any time around that module. That is OK, I do not care. I am just suggesting how to enhance the documentation a bit.

TL;DR for the Turris Team:
With a AVM FRITZ!Box as DNS forwarder (default setting in Turris OS), I can block DNS from one device for the other device, because of the security feature 0X20, just by placing a DNS query for the very same domain. Actually, Turris MOX blocks itself and does not update ever.

When I set up forwarding by clicking in Foris and add verbose(true) to my config, even the first queries after restarting the service go through the configured address. Yes, it starts by asking for root NS addresses but does not use them for asking.

As it is now, SAFEMODE does not seem a good default. It wasn’t intended for such use. EDIT: and yes, it decreases security.

Then, enable NO_0X20 on default, somehow, for example when I select the ISP as resolver and disable DNSSEC? By the way, what is the purpose of SAFEMODE?

I re-tested this with two of my own domains (one has DNSSEC and one does not have DNSSEC). With both, I can replicate that CNAME + A(AAA) issue:

  • Knot Resolver does not cache the answer, although it does when I avoid the CNAME.
  • AVM replies with the cached name randomization, although it does not when I avoid the CNAME.

I tried a lot of other DNS types and different amount of answers. Nothing yet tricked those two symptoms. Consequently, this CNAME thing tricks both implementations into a special mode, making this issue so complex (you need CNAME, you need FRITZ!Box, … then Turris OS goes for TCP). However, that provides a terrible easy workaround when it comes to the Turris updater: Avoid the CNAME for repo.turris.cz. Wouldn’t that be a feasible approach for those on old Turris versions?

Ahh. Then it is just cosmetic. Isn’t it possible to avoid that? I got confused by that and thought something else is in a different state; actually without the verbose log, I thought another resolver is around.

Currently it’s automatically used as a fallback in certain error situations.