Omnia DNS randomly responding with no result

pwgen · March 30, 2020, 12:34pm

Hello,

We’re using TO (OpenWrt omnia 15.05 r47055) as AP/DNS/router for two office locations. In both places I have been getting reports about “randomly not working pages”, however I wasn’t able to track any specific site, due to lack of good reports.
Currently I managed to find one issue myself, as well as got one valid report.

Issue 1: No result returned by TO even when using +trace

$ dig logs.us-east-2.amazonaws.com @192.168.5.1 +trace

; <<>> DiG 9.10.6 <<>> logs.us-east-2.amazonaws.com @192.168.5.1 +trace
;; global options: +cmd
;; Received 28 bytes from 192.168.5.1#53(192.168.5.1) in 0 ms

No result returned by TO, not using trace:

$ dig logs.us-east-2.amazonaws.com @192.168.5.1

; <<>> DiG 9.10.6 <<>> logs.us-east-2.amazonaws.com @192.168.5.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43723
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;logs.us-east-2.amazonaws.com.	IN	A

;; Query time: 104 msec
;; SERVER: 192.168.5.1#53(192.168.5.1)
;; WHEN: Mon Mar 30 14:30:08 CEST 2020
;; MSG SIZE  rcvd: 57

OK from any other DNS server (yes, i also checked clodflare DNS servers, which I use as forwarders):

$ dig logs.us-east-2.amazonaws.com @8.8.8.8

; <<>> DiG 9.10.6 <<>> logs.us-east-2.amazonaws.com @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64203
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;logs.us-east-2.amazonaws.com.	IN	A

;; ANSWER SECTION:
logs.us-east-2.amazonaws.com. 59 IN	A	52.95.22.49

;; Query time: 36 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Mon Mar 30 14:30:44 CEST 2020
;; MSG SIZE  rcvd: 73

Second issue regards domain hypno.nimja.com - it does not resolve when using TO as DNS, resolves from other DNS servers.

kresd is configured for TLS forward:

net.ipv6=false
policy.add(policy.all(
  policy.TLS_FORWARD({
    {'1.1.1.1', hostname='cloudflare-dns.com', ca_file='/etc/ssl/certs/DigiCertECCSecureServerCA.pem'},
    {'1.0.0.1', hostname='cloudflare-dns.com', ca_file='/etc/ssl/certs/DigiCertECCSecureServerCA.pem'}
  })
))

Do you have any tips how we can DNS lookup work more reliable in internal network with Turris Omnia?

Best,
Marek Obuchowicz
KoreKontrol Germany

jklaas · March 30, 2020, 2:21pm

I’ve noticed this too since the latest update to 3.11.14 or 15.

A site will return a SERVFAIL sometimes (like google.com this morning for me) until I “dig” for it to my forwarder directly on the router. I’m not sure if that’s an issue with the forwarder or my local kresd instance but since it says SERVFAIL, I’m guessing kresd is the culprit.

It looks like I can run kresd and set verbose(true), but it’s not clear if it attaches to the current process (it looks like it). In any case, I have no idea what the format of the mdb file it puts out is?

vcunat · March 30, 2020, 3:38pm

No process-attaching magic is done, but verbose logs are exactly what I’d like to see. Here’s the way usually recommended by support: https://wiki.turris.cz/doc/en/howto/dnsdebug#enable_verbose_logging

pwgen · March 30, 2020, 4:00pm

Thanks for the link! I tried installing resolver-debug (via web interface and SSH) - it’s failing with:

Unknown package 'resolver-debug'.
Collected errors:
 * opkg_install_cmd: Cannot install package resolver-debug.

Has the wiki page became outdated?

Pepe · March 30, 2020, 4:10pm

You need to update lists first. This can be done in LuCI or you can update it in CLI with this command:

opkg update

and then you can install resolver-debug. This applies to any package, which you want to install.

May I ask you if you are using DNS forwarding in Foris? If you do, could you please tell me to which servers? In any of those cases, would you please send diagnostics and verbose log to tech.support@turris.cz? We will look at it together with @vcunat.

There was not any update related to DNS except unbound for Turris 1.x since 7th January and it was included in Turris OS 3.11.13 released on 16th January.

pwgen · March 30, 2020, 4:42pm

Thanks for your quick response!

I can’t reproduce anymore
DNS entries which didn’t resolve in the morning are working now - which is consistent with what our employees observe.
As I already know how to get detailed output, I will get it when problem appears again.

Best regards,
Marek

jklaas · March 30, 2020, 8:42pm

Not in Foris, but yes, I have forwarding enabled. If I enable it in Foris, it breaks the way I want stuff to work. I’ve set up, so my custom.conf arranges things in the way that doesn’t break the way I want things to work.

I do DOT to Cloudflare, on IPv4 and IPv6.

I can send verbose logs. But it could take some time as this only happens once a week or less. If you still want me to send logs for that time period, let me know.

vcunat · March 31, 2020, 5:00am

It should be enough to send verbose logs just from the moment when the problem happens. I think you can keep logging all the time – it generates lots of output, but AFAIK rotating logs and deleting old ones works OK.

vookimedlo1 · April 11, 2020, 8:34pm

Hi there,

Any update on this topic? I experience this annoying issue from time to time. I would say this happens only when using TLS DNS, no matter which provider is selected from the drop down list in the Foris. I takes about 14 days to re-create the issue.

vcunat · April 11, 2020, 8:52pm

I believe I’ve received no logs related to this thread so far. No news from my/kresd side.

pete · April 15, 2020, 5:50am

Can confirm the same. Cloudflare TLS, Omnia on latest v3 Firmware.

s1w.cz resolves just fine
www.s1w.cz doesn’t

I have the adblock package installed.

Switching to CZ.NIC TLS dns in Foris resolved the issue for now. We’ll see if it breaks again in a couple of weeks…

EDIT: happens on several domains as well. domain.tld resolves, www.domain.tld doesn’t.

michalko58 · April 15, 2020, 6:03pm

Seems the same on TOS 5.1 HBL. Switching to CZ.NIC TLS helped.

Pepe · April 15, 2020, 6:11pm

We would like to help you with the issue, which you guys have. But so far, we don’t have any verbose logs (in this matter preferred) or diagnostics, which could help us not even on support. For now, we know that there is something going on with Cloudflare. There are different versions of Knot Resolver in different Turris OS versions, but we are going to sync it in the upcoming days.

michalko58 · April 15, 2020, 6:18pm

First, I thought that new AdBlock is source of these problems, unless I have seeen this thread.

martinu · April 16, 2020, 6:33am

I had similar experience. Resolving for some names “randomly” stopped to work and in some seconds or minutes it was back. I have feeling, that problem appeared with update 3.11.14 or 15. And switching of dns from Cloudflare to CZ.NIC TLS fixed it. I did the switch about 2 weeks ago.

gok · April 16, 2020, 11:36am

Hi,

This problem of some domains not being resolved for no apparent reason just happened to me (using Turris Omnia, TOS 3.11.16).
I followed the instructions at https://wiki.turris.cz/doc/en/howto/dnsdebug#enable_verbose_logging, installing resolver-debug and running it.
However, immediately after running resolver-debug, the one domain which made me notice this problem started being resolved noramlly again.
So, I will keep running resolver-debug, and as soon as I notice another occurrence of this problem, I will send the verbose logs to tech.support at turris.cz.

Hope this helps,
Cheers

pwgen · April 16, 2020, 4:02pm

@gok - i had similiar feeling, that issue was resolved just after I enabled verbose logging.
@Pepe does it sound logical, do we restart kresd or somehow flush it while enabling verbose logging? If yes, that would mean that we need to find another way to report the bug (instead of: wait for bug to happen, enable verbose logging, retry request). Any ideas?

Marek

vcunat · April 16, 2020, 4:16pm

You can run with verbose logging permanently, if you don’t mind your system logs being dominated by that. Note however that as it is now, changing DNS settings or restarting the router switches off the verbosity.