VPN and DNS issue

Hi there,

I have configured my TO router as a VPN client using a .opvn file in /etc/config/openvpn like this:

config openvpn 'vpn'
    option enabled '1'
    option config '/etc/openvpn/vpn_config_from_my_vpn_provider.opvn'

The network configuration for the VPN interface is:

config interface 'vpntun0'
    option proto 'none'
    option ifname 'tun0'

The firewall configuration is:

config zone
    option name 'vpn'
    option network 'vpntun0'
    option masq '1'
    option mtu_fix '1'
    option input 'ACCEPT'
    option output 'ACCEPT'
    option forward 'DROP'

config forwarding
    option family 'ipv4'
    option src 'lan'
    option dest 'vpn'

Looking at the /var/log/messages, I see that the VPN starts correctly. When I checked the VPN connection using a site like https://www.expressvpn.com/what-is-my-ip, I also see that it is the IP address of the VPN and there is no DNS nor WebRTC leaks.

So far, so good! :slight_smile:

The issue is related to Web browsing and especially to domain name resolving. From my Archlinux host on LAN, I can for example go to forum.turris.cz without problem for some times. Then, I can no longer access the web site. Firefox displays the “Looking up forum.turris.cz” message at the bottom left for a few seconds, then displays its error page “We can’t connect to the server at forum.turris.cz”. I have to wait a little bit before trying again several times, and after a while I can access the web site again.

Note that while I can’t access forum.turris.cz, I can however go to github.com for example. And sometimes I can’t go to github.com while there is no problem to access to forum.turris.cz.

When I connect to the TO using ssh and use dig to get the IP address of the domain name, the first attempts may return no result, and suddenly I get at last the IP address.

Knot is configured to not forward DNS requests to upstream DNS servers (option forward_upstream '0'). I also tried to set option net_ipv6 '0' in the Knot configuration file (/etc/config/resolver) but it does not resolve the issue.

I do not have any problem at all when the VPN is stopped.

So is there something else I have to configure? Why some domain names are suddenly not resolved while it works right before? Is there some logs I can checked to find the origin of the issue?

Any help is welcome! :wink:

what is in youir /etc/openvpn/vpn_config_from_my_vpn_provider.opvn file?

It sounds like if there was a different DNS server setting provided from your vpn.
Check your DNS servers before and after vpn connection.

Maybe one of the DNS server cannot resolve something then it doesn’t work and later it falls back to a different DNS server that can resolve it.

Hello @Fenevadkan! Thank you for your answer.

Here is the content of the .opvn file:

dev tun
fast-io
persist-key
persist-tun
nobind
remote ...

remote-random
pull
comp-lzo no
tls-client
verify-x509-name ...
ns-cert-type ...
key-direction 1
route-method exe
route-delay 2
tun-mtu 1500
fragment 1300
mssfix 1450
verb 3
cipher ...
keysize ...
auth ...
sndbuf ...
rcvbuf ...
auth-user-pass ...

<ca>
...
</ca>
<cert>
...
</cert>
<key>
...
</key>
<tls-auth>
...
</tls-auth>

Check your DNS servers before and after vpn connection.

I checked those files before and after the vpn connection:

/etc/resolv.conf (which is a symbolic link to /tmp/resolv.conf)

search lan
nameserver 127.0.0.1

/tmp/resolv.conf.auto

# Interface wan
nameserver <DNS server 1 from my ISP>
nameserver <DNS server 2 from my ISP>
search <domain name of my ISP>

The content of those files does not change before and after the vpn is up. Are there other files I have to check?

During the vpn start, I see this line in /var/log/messages:

2018-12-15 11:09:44 notice openvpn(vpn)[8785]: PUSH: Received control message: 'PUSH_REPLY,redirect-gateway def1,dhcp-option DNS xxx.xxx.xxx.xxx,route xxx.xxx.xxx.xxx,topology xxxx,ping 10,ping-restart 60,ifconfig xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx,peer-id xxxx,cipher xxxxxx'

As you can see, there is an IPv4 address that is provided by the vpn for a primary domain name server (dhcp-option DNS). How is this value used by the router? Does it have an impact on Knot? As Knot is configured to not forward DNS requests to upstream DNS servers, it should not used the DNS server provided by the vpn, correct?

you can also try resolving manually from that DNS from vpn.

And have you checked this settings in Foris?:

What do you mean by “resolving manually”?

This settings is only useful when using the router as a vpn server. In my case, I use the router as a vpn client that connects to my vpn provider.

This tool is installed on Omnia by default, I think, in case you don’t know of anything similar on your preferred machine:

dig @resolver.I.P example.cz

Thank you for the details. Will try that next time I meet the issue.

Well, I met the issue again and connected to my router immediately using ssh to make some tests with the dig command.

Typically, I was unable to access forum.turris.cz. Here is the output of “dig forum.turris.cz”:

; <<>> DiG 9.11.2-P1 <<>> forum.turris.cz
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 37273
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;forum.turris.cz.               IN      A

;; Query time: 343 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Dec 16 14:48:41 CET 2018
;; MSG SIZE  rcvd: 33

You can see that the request is sent to the local DNS server 127.0.0.1:53 (Knot), and the status is SERVFAIL.

I then tried just after with the DNS server pushed by my vpn provider with “dig @<IP address of the vpn's DNS server> forum.turris.cz”:

; <<>> DiG 9.11.2-P1 <<>> @xxx.xxx.xxx.xxx forum.turris.cz
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13800
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;forum.turris.cz.               IN      A

;; ANSWER SECTION:
forum.turris.cz.        1732    IN      CNAME   proxy.turris.cz.
proxy.turris.cz.        1732    IN      A       217.31.192.69

;; Query time: 23 msec
;; SERVER: xxx.xxx.xxx.xxx#53(xxx.xxx.xxx.xxx)
;; WHEN: Sun Dec 16 14:47:19 CET 2018
;; MSG SIZE  rcvd: 80

The request is correctly sent to the DNS server provided by the VPN, and the domain name is resolved.

So, it looks like the issue comes from Knot. How can I debug that? Is there some logs that can be activated to know why the SERVFAIL status occurs?

Thanks.

This tutorial should still work to get verbose logs from kresd.

I already installed the resolver-debug package in the past so I just started it. Detailed log messages from kresd can then be seen in /var/log/messages.

I run the dig command with different domain names to do some tests. Here is an example of domain name resolution that failed:

2018-12-16 16:50:00 info kresd[12772]: [    0][plan] plan 'forum.turris.cz.' type 'A'
2018-12-16 16:50:00 info kresd[12772]: [14463][iter]   'forum.turris.cz.' type 'A' id was assigned, parent id 0
2018-12-16 16:50:00 info kresd[12772]: [14463][cach]   => skipping exact packet: rank 025 (min. 020), new TTL -5629
2018-12-16 16:50:00 info kresd[12772]: [14463][cach]   => skipping unfit CNAME RR: rank 060, new TTL -227
2018-12-16 16:50:00 info kresd[12772]: [14463][cach]   => skipping unfit NS RR: rank 040, new TTL -3683
2018-12-16 16:50:00 info kresd[12772]: [14463][cach]   => no NSEC* cached for zone: cz.
2018-12-16 16:50:00 info kresd[12772]: [14463][cach]   => skipping zone: cz., NSEC, hash 0;new TTL -1234304552, ret -2
2018-12-16 16:50:00 info kresd[10213]: Last message '[14463][cach]   => s' repeated 1 times, suppressed by syslog-ng on turris
2018-12-16 16:50:00 info kresd[12772]: [14463][resl]   => going insecure because there's no covering TA
2018-12-16 16:50:00 info kresd[12772]: [14463][zcut]   found cut: cz. (rank 002 return codes: DS 0, DNSKEY 0)
2018-12-16 16:50:00 info kresd[12772]: [60103][iter]   'forum.turris.cz.' type 'A' id was assigned, parent id 0
2018-12-16 16:50:00 info kresd[12772]: [     ][nsre] probing timeouted NS: 194.0.12.1, score 1900
2018-12-16 16:50:00 info kresd[12772]: [     ][nsre] probing timeouted NS: 194.0.13.1, score 1900
2018-12-16 16:50:00 info kresd[12772]: [     ][nsre] probing timeouted NS: 194.0.14.1, score 1900
2018-12-16 16:50:00 info kresd[12772]: [60103][resl]   => querying: '193.29.206.1' score: 1334 zone cut: 'cz.' qname: 'tURrIs.CZ.' qtype: 'NS' proto: 'udp'
2018-12-16 16:50:00 info kresd[12772]: [60103][iter]   <= loaded 4 glue addresses
2018-12-16 16:50:00 info kresd[12772]: [60103][iter]   <= rcode: NOERROR
2018-12-16 16:50:00 info kresd[12772]: [60103][iter]   <= retrying with non-minimized name
2018-12-16 16:50:00 info kresd[12772]: [60103][cach]   => not overwriting A b.ns.nic.cz.
2018-12-16 16:50:00 info kresd[12772]: [60103][cach]   => not overwriting A c.ns.nic.cz.
2018-12-16 16:50:00 info kresd[12772]: [60103][cach]   => not overwriting A d.ns.nic.cz.
2018-12-16 16:50:00 info kresd[12772]: [60103][cach]   => not overwriting A a.ns.nic.cz.
2018-12-16 16:50:00 info kresd[12772]: [60103][resl]   <= server: '193.29.206.1' rtt: 24 ms
2018-12-16 16:50:00 info kresd[12772]: [41054][iter]   'forum.turris.cz.' type 'A' id was assigned, parent id 0
2018-12-16 16:50:00 info kresd[12772]: [41054][resl]   => querying: '193.29.206.1' score: 679 zone cut: 'cz.' qname: 'foRUm.TURRIs.Cz.' qtype: 'A' proto: 'udp'
2018-12-16 16:50:00 info kresd[12772]: [41054][iter]   <= loaded 4 glue addresses
2018-12-16 16:50:00 info kresd[12772]: [41054][iter]   <= rcode: NOERROR
2018-12-16 16:50:00 info kresd[12772]: [41054][iter]   <= lame response: non-auth sent negative response
2018-12-16 16:50:00 info kresd[12772]: [41054][cach]   => not overwriting A b.ns.nic.cz.
2018-12-16 16:50:00 info kresd[12772]: [41054][cach]   => not overwriting A d.ns.nic.cz.
2018-12-16 16:50:00 info kresd[12772]: [41054][cach]   => not overwriting A a.ns.nic.cz.
2018-12-16 16:50:00 info kresd[12772]: [41054][cach]   => not overwriting A c.ns.nic.cz.
2018-12-16 16:50:00 info kresd[12772]: [41054][resl]   => server: '193.29.206.1' flagged as 'bad'
2018-12-16 16:50:00 info kresd[12772]: [31144][iter]   'forum.turris.cz.' type 'A' id was assigned, parent id 0
2018-12-16 16:50:00 info kresd[12772]: [31144][resl]   => no valid NS left
2018-12-16 16:50:00 info kresd[12772]: [28800][iter]   'forum.turris.cz.' type 'A' id was assigned, parent id 0
2018-12-16 16:50:00 info kresd[12772]: [28800][resl]   => no valid NS left
2018-12-16 16:50:00 info kresd[12772]: [    0][resl]   AD: request NOT classified as SECURE
2018-12-16 16:50:00 info kresd[12772]: [28800][resl]   finished: 0, queries: 1, mempool: 16392 B

You can see the message server '193.29.206.1' flagged as 'bad. However, when the vpn is turned off, there is no such message for that IP address:

2018-12-16 17:01:33 info kresd[12772]: [    0][plan] plan 'forum.turris.cz.' type 'A'
2018-12-16 17:01:33 info kresd[12772]: [41846][iter]   'forum.turris.cz.' type 'A' id was assigned, parent id 0
2018-12-16 17:01:33 info kresd[12772]: [41846][cach]   => skipping exact packet: rank 025 (min. 020), new TTL -6322
2018-12-16 17:01:33 info kresd[12772]: [41846][cach]   => skipping unfit CNAME RR: rank 020, new TTL 1601
2018-12-16 17:01:33 info kresd[12772]: [41846][cach]   => skipping unfit NS RR: rank 040, new TTL -4376
2018-12-16 17:01:33 info kresd[12772]: [41846][cach]   => no NSEC* cached for zone: cz.
2018-12-16 17:01:33 info kresd[12772]: [41846][cach]   => skipping zone: cz., NSEC, hash 0;new TTL 116, ret -2
2018-12-16 17:01:33 info kresd[10213]: Last message '[41846][cach]   => s' repeated 1 times, suppressed by syslog-ng on turris
2018-12-16 17:01:33 info kresd[12772]: [41846][resl]   => going insecure because there's no covering TA
2018-12-16 17:01:33 info kresd[12772]: [41846][zcut]   found cut: cz. (rank 002 return codes: DS 0, DNSKEY 0)
2018-12-16 17:01:33 info kresd[12772]: [64790][iter]   'forum.turris.cz.' type 'A' id was assigned, parent id 0
2018-12-16 17:01:33 info kresd[12772]: [     ][nsre] probing timeouted NS: 194.0.12.1, score 1900
2018-12-16 17:01:33 info kresd[12772]: [     ][nsre] probing timeouted NS: 194.0.13.1, score 1900
2018-12-16 17:01:33 info kresd[12772]: [     ][nsre] probing timeouted NS: 194.0.14.1, score 1900
2018-12-16 17:01:33 info kresd[12772]: [64790][resl]   => querying: '193.29.206.1' score: 34 zone cut: 'cz.' qname: 'tuRRIS.cZ.' qtype: 'NS' proto: 'udp'
2018-12-16 17:01:33 info kresd[12772]: [64790][iter]   <= rcode: NOERROR
2018-12-16 17:01:33 info kresd[12772]: [64790][iter]   <= continuing with qname minimization
2018-12-16 17:01:33 info kresd[12772]: [64790][resl]   <= server: '193.29.206.1' rtt: 31 ms
2018-12-16 17:01:33 info kresd[12772]: [58600][iter]   'forum.turris.cz.' type 'A' id was assigned, parent id 0
2018-12-16 17:01:33 info kresd[12772]: [58600][resl]   => querying: '193.29.206.1' score: 32 zone cut: 'turris.cz.' qname: 'ForUM.tuRrIS.cz.' qtype: 'A' proto: 'udp'
2018-12-16 17:01:33 info kresd[12772]: [58600][iter]   <= rcode: NOERROR
2018-12-16 17:01:33 info kresd[12772]: [58600][cach]   => not overwriting A proxy.turris.cz.
2018-12-16 17:01:33 info kresd[12772]: [58600][cach]   => not overwriting CNAME forum.turris.cz.
2018-12-16 17:01:33 info kresd[12772]: [58600][resl]   <= server: '193.29.206.1' rtt: 34 ms
2018-12-16 17:01:33 info kresd[12772]: [    0][resl]   AD: request NOT classified as SECURE
2018-12-16 17:01:33 info kresd[12772]: [58600][resl]   finished: 0, queries: 1, mempool: 16392 B

So why this bad flag when the vpn is on? I don’t know enough how DNS servers work to understand the problem. Is it related to the score value?

Edit: disabling DNSSEC in Foris does not solve the problem.

I did the upgrade to Turris OS 3.11, and Knot Resolver has been updated from v2.4.1 to v3.1.0. Unfortunately the issue still occurs.

Opened issue on Gitlab: https://gitlab.labs.nic.cz/turris/openwrt/issues/236

This is already a bad point. It’s incorrect answer from authoritative server for .cz (193.29.206.1) but I’ve never seen our servers giving these bad answers. Combined with the fact that it only occurs over the VPN, it seems almost certain that the VPN intercepts those DNS queries and answers them directly (and even wrongly) :face_vomiting:

Interestingly enough, I’m using the server from my vpn provider which is located in Czech Republic for two days, and I haven’t any problem with domain name resolutions :thinking:

Here is an example of the Kresd log for the resolution of the domain name forum.turris.cz:

2018-12-18 22:03:58 info kresd[15406]: [00000.00][plan] plan 'forum.turris.cz.' type 'A' uid [53913.00]
2018-12-18 22:03:58 info kresd[15406]: [53913.00][iter]   'forum.turris.cz.' type 'A' new uid was assigned .01, parent uid .00
2018-12-18 22:03:58 info kresd[15406]: [53913.01][cach]   => skipping unfit CNAME RR: rank 020, new TTL 1616
2018-12-18 22:03:58 info kresd[15406]: [53913.01][cach]   => no NSEC* cached for zone: turris.cz.
2018-12-18 22:03:58 info kresd[15406]: [53913.01][cach]   => skipping zone: turris.cz., NSEC, hash 0;new TTL -123456789, ret -2
2018-12-18 22:03:58 info kresd[2743]: Last message '[53913.01][cach]   =' repeated 1 times, suppressed by syslog-ng on turris
2018-12-18 22:03:58 info kresd[15406]: [53913.01][resl]   => going insecure because there's no covering TA
2018-12-18 22:03:58 info kresd[15406]: [53913.01][zcut]   found cut: turris.cz. (rank 002 return codes: DS -2, DNSKEY -2)
2018-12-18 22:03:58 info kresd[15406]: [53913.01][resl]   => id: '18032' querying: '194.0.12.1' score: 47 zone cut: 'turris.cz.' qname: 'FORUm.TURRIs.Cz.' qtype: 'A' proto: 'udp'
2018-12-18 22:03:59 info kresd[15406]: [53913.01][iter]   <= rcode: NOERROR
2018-12-18 22:03:59 info kresd[15406]: [53913.01][cach]   => not overwriting A proxy.turris.cz.
2018-12-18 22:03:59 info kresd[15406]: [53913.01][cach]   => not overwriting CNAME forum.turris.cz.
2018-12-18 22:03:59 info kresd[15406]: [53913.01][resl]   <= server: '194.0.12.1' rtt: 47 ms
2018-12-18 22:03:59 info kresd[15406]: [53913.01][resl]   AD: request NOT classified as SECURE
2018-12-18 22:03:59 info kresd[15406]: [53913.01][resl]   finished: 0, queries: 1, mempool: 32784 B

I’m now re-using the previous vpn server I used before to see if errors occur again…