Knot Resolver not responding

I am bamboozled. This seems to happen intermittently and be resolved by a reboot, but I have it before me, now, and would diagnose if I knew how but find myself a tad stuck. Here are the symptoms:

  1. I use the standard Omnia setup with DHCP on my local LAN and have the names served to the LAN. Works a charm and I can access the router for example by its name or any one of my LAN peripherals by name (the NAS for example). I love it. But now and again name resolution fails. I have such a moment at hand and a moment to try a diagnosis and this is what I have (I wish I had more):

  2. The router is at 192.168.0.1 and has the name “cerberus” from my desktop I can do this to illustrate:

    $ ping 192.168.0.1
    PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
    64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=0.407 ms
    64 bytes from 192.168.0.1: icmp_seq=2 ttl=64 time=0.464 ms
    ^C
    — 192.168.0.1 ping statistics —
    2 packets transmitted, 2 received, 0% packet loss, time 999ms
    rtt min/avg/max/mdev = 0.407/0.435/0.464/0.035 ms
    $ ping cerberus
    ping: unknown host cerberus
    $ dig @192.168.0.1

    ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @192.168.0.1
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32787
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 1

    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 4096
    ;; QUESTION SECTION:
    ;. IN NS

    ;; ANSWER SECTION:
    . 129696 IN NS a.root-servers.net.
    . 129696 IN NS b.root-servers.net.
    . 129696 IN NS c.root-servers.net.
    . 129696 IN NS d.root-servers.net.
    . 129696 IN NS e.root-servers.net.
    . 129696 IN NS f.root-servers.net.
    . 129696 IN NS g.root-servers.net.
    . 129696 IN NS h.root-servers.net.
    . 129696 IN NS i.root-servers.net.
    . 129696 IN NS j.root-servers.net.
    . 129696 IN NS k.root-servers.net.
    . 129696 IN NS l.root-servers.net.
    . 129696 IN NS m.root-servers.net.

    ;; Query time: 13 msec
    ;; SERVER: 192.168.0.1#53(192.168.0.1)
    ;; WHEN: Tue May 30 16:06:25 AEST 2017
    ;; MSG SIZE rcvd: 239
    $ dig @192.168.0.1 cerberus

    ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @192.168.0.1 cerberus
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 292
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 4096
    ;; QUESTION SECTION:
    ;cerberus. IN A

    ;; AUTHORITY SECTION:
    . 83638 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2017053000 1800 900 604800 86400

    ;; Query time: 13 msec
    ;; SERVER: 192.168.0.1#53(192.168.0.1)
    ;; WHEN: Tue May 30 16:10:26 AEST 2017
    ;; MSG SIZE rcvd: 112

And on the Omnia itself:

# netstat -lp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:www             0.0.0.0:*               LISTEN      2589/lighttpd
tcp        0      0 0.0.0.0:domain          0.0.0.0:*               LISTEN      23571/kresd
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      1873/sshd
tcp        0      0 0.0.0.0:https           0.0.0.0:*               LISTEN      2589/lighttpd
tcp        0      0 0.0.0.0:microsoft-ds    0.0.0.0:*               LISTEN      6104/smbd
tcp        0      0 0.0.0.0:netbios-ssn     0.0.0.0:*               LISTEN      6104/smbd
tcp        0      0 :::www                  :::*                    LISTEN      2589/lighttpd
tcp        0      0 :::domain               :::*                    LISTEN      23571/kresd
tcp        0      0 :::ssh                  :::*                    LISTEN      1873/sshd
tcp        0      0 :::https                :::*                    LISTEN      2589/lighttpd
tcp        0      0 :::microsoft-ds         :::*                    LISTEN      6104/smbd
tcp        0      0 :::netbios-ssn          :::*                    LISTEN      6104/smbd
udp        0      0 0.0.0.0:domain          0.0.0.0:*                           23571/kresd
udp        0      0 0.0.0.0:bootps          0.0.0.0:*                           20343/dnsmasq
udp        0      0 0.0.0.0:44154           0.0.0.0:*                           24705/busybox
udp        0      0 192.168.0.255:netbios-ns 0.0.0.0:*                           6106/nmbd
udp        0      0 192.168.0.1:netbios-ns  0.0.0.0:*                           6106/nmbd
udp        0      0 0.0.0.0:netbios-ns      0.0.0.0:*                           6106/nmbd
udp        0      0 192.168.0.255:netbios-dgm 0.0.0.0:*                           6106/nmbd
udp        0      0 192.168.0.1:netbios-dgm 0.0.0.0:*                           6106/nmbd
udp        0      0 0.0.0.0:netbios-dgm     0.0.0.0:*                           6106/nmbd
udp        0      0 0.0.0.0:7001            0.0.0.0:*                           -
udp        0      0 :::dhcpv6-server        :::*                                1732/odhcpd
udp        0      0 :::domain               :::*                                23571/kresd
raw        0      0 :::58                   ::%3069291152:*         58          1732/odhcpd
raw        0      0 :::58                   ::%3069291152:*         58          1732/odhcpd
Active UNIX domain sockets (only servers)
Proto RefCnt Flags       Type       State         I-Node PID/Program name    Path
unix  2      [ ACC ]     STREAM     LISTENING       1579 1900/syslog-ng      /var/syslog-ng.ctl
unix  2      [ ACC ]     STREAM     LISTENING     4522081 23571/kresd         tty/23571
unix  2      [ ACC ]     STREAM     LISTENING       9570 6106/nmbd           /var/nmbd/unexpected
unix  2      [ ACC ]     STREAM     LISTENING       1132 794/ubusd           /var/run/ubus.sock
unix  2      [ ACC ]     STREAM     LISTENING       3217 2590/python         /tmp/fastcgi.python.socket-0
# ping cerberus
PING cerberus (192.168.0.1): 56 data bytes
64 bytes from 192.168.0.1: seq=0 ttl=64 time=0.088 ms
64 bytes from 192.168.0.1: seq=1 ttl=64 time=0.110 ms
^C
--- cerberus ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.088/0.099/0.110 ms
# dig cerberus

; <<>> DiG 9.9.8-P4 <<>> cerberus
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 34837
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cerberus.			IN	A

;; AUTHORITY SECTION:
.			83658	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2017053000 1800 900 604800 86400

;; Query time: 13 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue May 30 16:10:06 AEST 2017
;; MSG SIZE  rcvd: 112
# dig @192.168.0.1 cerberus

; <<>> DiG 9.9.8-P4 <<>> @192.168.0.1 cerberus
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51910
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cerberus.			IN	A

;; AUTHORITY SECTION:
.			83206	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2017053000 1800 900 604800 86400

;; Query time: 12 msec
;; SERVER: 192.168.0.1#53(192.168.0.1)
;; WHEN: Tue May 30 16:17:38 AEST 2017
;; MSG SIZE  rcvd: 112

Anyhow cerberus is resolved on the router but not served to the desktop. Which is a puzzle. I shall reboot the router and see if it comes good and report with dig output here again.

OK, rebooted the router and as expected, from my desktop now:

$ ping cerberus
PING cerberus.lan (192.168.0.1) 56(84) bytes of data.
64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=0.539 ms
64 bytes from 192.168.0.1: icmp_seq=2 ttl=64 time=0.572 ms
64 bytes from 192.168.0.1: icmp_seq=3 ttl=64 time=0.504 ms
^C
--- cerberus.lan ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.504/0.538/0.572/0.033 ms
$ dig @192.168.0.1 cerberus

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @192.168.0.1 cerberus
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 30653
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cerberus.			IN	A

;; AUTHORITY SECTION:
.			82880	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2017053000 1800 900 604800 86400

;; Query time: 13 msec
;; SERVER: 192.168.0.1#53(192.168.0.1)
;; WHEN: Tue May 30 16:23:04 AEST 2017
;; MSG SIZE  rcvd: 112

and alas I see no difference of consequence in dig output. Though in neither case does it return an answer, only an Authority. Grrr.

Turns out (and I should have done this before reboot and will next time it happens) I should have done:

$ dig @192.168..0.1 cerberus.lan

; <<>> DiG 9.10.3-P4-Ubuntu <<>> cerberus.lan
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62475
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cerberus.lan.			IN	A

;; ANSWER SECTION:
cerberus.lan.		0	IN	A	192.168.0.1

;; Query time: 0 msec
;; SERVER: 192.168.0.1#53(192.168.0.1)
;; WHEN: Tue May 30 16:28:44 AEST 2017
;; MSG SIZE  rcvd: 57

As that returns an answer.

Mind you ping cerberus works as well as ping cerberus.lan so something fishy is going on.

My tip is that your /etc/resolv.conf contains search lan, or something.

Why thank you! That it does indeed. Here it is:

# cat /etc/resolv.conf
search lan
nameserver 127.0.0.1

What puzzles me though is:

  1. Where is the documentation on resolv.conf? The best I’ve found is not even close to useful documentation on the format of the file alas:

    https://wiki.openwrt.org/doc/uci/dhcp

  2. Given I can’t find documentation, what does “search lan” mean and do? I mean it’s just the default config on the Omnia.

  3. Why would any given setting cause a breakdown in DNS functioning that is fixed by a reboot, to break down again at some arbitrary time later, to be fixed by a reboot? Etc.

  4. I have to wait for the next time it happens to test and compare “ping cerberus” with “ping cerberus.lan” and “dig @192.168.0.1 cerberus” with “dig @192.168.0.1 cerberus.lan”. I’m very curious what I learn, and weather .lan names continue to resolve after the glitch surfaces. Right now both are resolving fine.

    $ nslookup cerberus
    Server: 192.168.0.1
    Address: 192.168.0.1#53

    Non-authoritative answer:
    Name: cerberus.lan
    Address: 192.168.0.1

    $ nslookup cerberus.lan
    Server: 192.168.0.1
    Address: 192.168.0.1#53

    Non-authoritative answer:
    Name: cerberus.lan
    Address: 192.168.0.1

Somehow, somewhen, someone is adding .lan to the non-specific name, which is great, but that is perhaps the bit that goes awry and maybe, .lan’s continue to resolve. Time will tell, when it next surfaces. It’s happened a few times before (albeit not often) and so I expect it will again. Which is another puzzle element, namely it’s happened before albeit not often, mostly it just works … like now.

  1. I just run man resolv.conf on my Linux desktop and I assume the meaning doesn’t differ (significantly).

Note that dig and nslookup don’t use the OS resolving function, so they bypass (most) /etc/resolv.conf stuff and query the resolver directly. (I’m simplifying the situation.)

I think DHCP is able to suggest to clients that they should search some domains, but I don’t know much the processes that modify /etc/resolv.conf or even similar settings on non-Linux clients.

Resolvers themselves aren’t supposed to resolve the plain cerberus name (for example), at least by default, because IANA may perfectly well decide to create a new top-level domain of that very name (there are many TLDs nowadays). It’s up to DNS clients to handle such “aliases” if they like them.

Thanks, man resolv.conf sure works on the desktop and helps a little. But suddenly I find myself wishing I knew much more and/or how to find out easily.

It seems the router (which is acting as DNS on the LAN) has:

    # cat /etc/resolv.conf
    search lan
    nameserver 127.0.0.1

and the desktop has:

    $ cat /etc/resolv.conf
    # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
    #     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
    nameserver 192.168.0.1
    search lan

But the man page, although dedicating some words to it, does not explain what the search directive actually does. It says simply:

search Search list for host-name lookup.
              The search list is normally determined from the local domain name; by default, it contains only the  local  domain
              name.   This  may be changed by listing the desired domain search path following the search keyword with spaces or
              tabs separating the names.  Resolver queries having fewer than ndots dots (default is 1) in them will be attempted
              using each component of the search path in turn until a match is found.  For environments with multiple subdomains
              please read options ndots:n below to avoid man-in-the-middle attacks and unnecessary  traffic  for  the  root-dns-
              servers.   Note  that  this  process may be slow and will generate a lot of network traffic if the servers for the
              listed domains are not local, and that queries will time out if no server is available for one of the domains.

              The search list is currently limited to six domains with a total of 256 characters.

Which sadly does not say what “search lan” means. Only that the resolver queries will be attempted for “lan” whatever that means? How? How does a resolver search “lan”? I guess there’s no one answer and it depends on the resolver? Wow it would be nice to understand how this all hangs together.

I would guess from above that name resolution on my desktopis governed by its resolv.conf so it tries " nameserver 192.168.0.1" and if no response from it then “search lan”. The first step I understand, I think, it’ll open port 53 on on 192.168.0.1 and submit a query for a name resolution.

Then on the router (at 192.168.0.1) kresd is listening on port 53, and it gets the query and in turn it is obliged (as the resolver to nobey resolv.conf, and on the router that says “search lan” first then “nameserver 127.0.0.1” which is confusing because that’s the loopback IP meaning kresd asks itself for a name resolution? Why is my head spinning?

Then things get very hairy, and a reminder as to just evolved and convoluted the *nix family of systems has become I guess. Namely how does kresd know to resolve local domain names, and how does it know what broader internet DNS to use to service requests from the LAN?

I can read but have only so much time on my hands and I can see that /etc/init.d/resolver, calls /etc/init.d/kresd which in turn has lines like:

config_load dhcp
config_foreach get_local_domain dnsmasq
if [ "$STATIC_DOMAINS" == "1" ]; then
	config_foreach set_local_host host
	config_foreach set_local_host domain
fi
if [ "$dynamic_domains" == "1" ]; then
	config_foreach set_dnsmasq_dhcp_script dnsmasq "$DHCP_SCRIPT"
fi

that are suggestive of loading the name to IP mappings from /tmp/dhcp.leases and possible /etc/config/dhcp which contains the dnsmasq static lease defintions (where LuCI seems to drop them).

But this uses shell function after shell function nested in files like /etc/rc.common /lib/functions.sh and /lib/config/uci.sh. Egads! Getting one’s head around all the abstractions here is a time consuming job to say the least. And then there is DHCP_SCRIPT which is /etc/kresd/dhcp_host_domain_ng.sh and it seems to load DHCP leases too …

Aaargh, what a spaghetti of interactions involving language like static domains, dynamic domains, dynamic hints and more …

In any case, for now it seems that the safest course to follow is explicitly use .lan on my local names and see how that fares, though part of me wants to stay as is, and see if name resolution without .lan fails again some time soon and I can try digging some more. Perhaps dig has some more verbose modes I can explore.

It seems that dig (and drill) both insist that cerberus does not deserve (or get an) an answer but cerberus.lan does and returns the IP of the router thus named (and same for my other named devices on the LAN). So I remain curious how nslookup and ping and ssh all resolve cerberus … is it as inferred above because my desktop resolver.con has “search lan” at top and they search the lan before asking the DNS? And if so, what does searching the lan mean and how does it work?

Oh, I’m full of questions alas.

1 Like