DNS broken after factory reset and update to 3.11.1

Right, you got into a weird state at that point: the resolver didn’t read from its /tmp/kresd/tty/* socket, yet apparently the process was still alive (I’ve never seen such a combination) and it was replying SERVFAIL to all DNS requests. Due to the dead tty socket kresd couldn’t be reconfigured alive, so the logs show basically nothing about its state, unfortunately… therefore ATM I see no leads how to find what exactly was wrong.

The command returns the new “verbosity level”, i.e. it meant you turned that on successfully. The logs you sent me seemed like DNS worked OK in that state. I can’t judge whether it was just because the process got restarted (that’s what happens when you change its configuration) or because of the actual config changes.

Using ISP’s DNS without validating its DNSSEC signatures – that’s probably the least problematic setup, i.e. closest to what other customers of the ISP get. Well, Foris explains what DNSSEC is about. Forwarding to ISP servers does break DNSSEC sometimes, depending on the brokenness of their servers, but as you’re changing the ISP now, it’s probably not worth getting details until you’ve changed. Again, verbose logs almost always show us why exactly resolution fails.

If it happens with factory settings on known hardware and seems to affect a number of users, surely you could reproduce this somehow on your side, right?

For me, the issue happened frequently enough that I had to fall back to my ISP supplied router lest I be lynched by my family.

The DNS kept breaking after a day or so, even with “Use provider’s DNS resolver” and “Disable DNSSEC” checked, so my housemates got another router while I was away.

Today, we switched our ISP from Comcast to Verizon FiOS, and my housemates want to continue using the new router they got. At some point I’ll plug the Turris Omnia into their router to use as a secondary access point, and then I can let you know if it’s still a problem.

Another in-depth problem description did I add in this thread:

It was brought to my attention that it might be a good idea to share it on this thread.

In the meantime I discovered that when you opt to have a lan domain value configere in foris for your local machines it sets the dynamic lease boolean value in /etc/config/resolver in the common section to true causing all kinds of problems as the python script which is called when this value is set is badly broken.

And I also have this problem in foris that it shows 2 x fail for dnscheck although ATM dns is working (let’s see for how long though).

1 Like

@vcunat is that included what you noted as [quote=“vcunat, post:30, topic:8985”]known to us[/quote]
? Did you have a look into my logs above?

I’m also curious, why Foris shows DNSSEC status ok, while DNSSEC is disabled (using forwarding to provider DNS)…

I followed your explanation, disabled Foris’ setting enable DHCP clients in DNS and reenabled DoT (Cloudflare).
And I directly enabled logging.
Let’s see where that gets us.

At least yours shows ok, mine shows fail for DNS and DNSSEC althouth kresd is up and running and serving requests

The log above shows basically just [priming] cannot resolve address ..., i.e. it’s not with verbose logging enabled and I can’t really know much from that (except that DNS apparently doesn’t work). I suppose you got it this short due to the “verbose button” doing a non-persistent change that gets removed by restarting or even reconfiguring the resolver.

We’ll see. You can check that every DNS request generates multiple lines in the log, e.g. [resl] lines in each one. My Omnia is usually capable of remembering only the last several hours of these logs by default, as the influx is significant.

IIRC the test simply checks some known public names – some that use more complex DNSSEC and should resolve fine, and some that must fail due to intentionally broken DNSSEC. Your ISP might be protecting you even if you don’t validate locally. Good ISPs do that, I believe, but I still find it better to check closer to end devices anyway – ideally on each machine but that’s out of scope of Turris.

Shame on me - I started debugging mode, but did’nt issue your echo 'verbose(true)' | socat - /tmp/kresd/tty/"$(pgrep kresd)". :frowning: The log above shows the snippet of the logs (of which I have another 400kb) just within the timeframe, when DNS resolving stopped working, but without that it’s worth nothing.

It does. And you are right for the syslog size - it will only contain some hours with that much info about every DNS request.
Right now uptime is 28 hours without DNS breaking. I really have a fealing that what @marcerlser found out could be the key.

To have the fail happening again set the checkmark in foris → DNS → Enable DHCP clients in DNS , which sets /etc/config/resolver → common

	option dynamic_domains '1'

and adds following line to /etc/config/dhcp → dnsmasq

	option dhcpscript '/etc/resolver/dhcp_host_domain_ng.py'

DoT is activated for Cloudflare, DNSSEC is deactivated.
Afterwards I started debugging and your echo 'verbose(true)' | socat - /tmp/kresd/tty/"$(pgrep kresd)".
So now I hope for the fail to happen within this night and inside the activated logs, because I have due to family needs to disable this all tomorrow morning again.

edit1 (update after 1 day): Didn’t had the time to deactivate logging and other measurements this morning - with mixed feelings I returned home this evening because of DNS not crashing today (luckily because I didn’t get killed by my family because of not working internet connection, sad as the real reason is still not found and the experiment needs to continue). But it seems syslog still contains all existing logs since my restart yesterday evening (ca. 576k rows/65MiB).
ps [@marcerlser]: Foris now shows correctly DNS working / DNSSEC not working.

1 Like

My foris still shows DNS not working which is clearly just wrong. Don’t know why it does this. Also could the maintainers please fix the issues when you enable checkmark for local lan in foris with this very buggy python script?

Just installed newest release 3.11.2 and now my DNS and DNSSEC show ok. Log is nice and quiet, however I didn’t try to enable local lan domain yet. Will probably test somewhere this week.

update 2 (after 10 days): Still no crash - seems like my router is kidding me :face_with_symbols_over_mouth: Stopped verbose mode, saved the log (ca. 1,48M rows/161MiB!), and started updating to 3.11.2. When it happens over again, I will restart logging and share it within this thread.

1 Like

This resolved the issue in my case, thanks!

1 Like