Repeated problems with kresd

Since some time I’m having troubles with my omnia. Since 3.11.1 it even got worse.

my log is full with messages like this (several thousand lines) they occur 30secs up to 3-4 mins they look like this and it’s just an excerpt:
2019-01-05 20:09:02 err dhcp_host_domain_ng.py: Wrong host format ‘/mnt/ssd/kresd/hints.tmp’ in host file 192.168.10.57 Grandstream-HT503_2.lan

2019-01-05 20:09:02 err dhcp_host_domain_ng.py: Kresd socket failed:<class ‘socket.error’>,[Errno 111] Connection refused

2019-01-05 20:09:02 err dhcp_host_domain_ng.py[]: Wrong host format '/mnt/ssd/kresd/hints.tmp' in host file 192.168.10.58 Raspberry_PI_3.lan
2019-01-05 20:09:02 err dhcp_host_domain_ng.py[]: Kresd socket failed:<class 'socket.error'>,[Errno 111] Connection refused
2019-01-05 20:09:02 err dhcp_host_domain_ng.py[]: Wrong host format '/mnt/ssd/kresd/hints.tmp' in host file 192.168.10.59 Raspberry_PI_3_WLAN.lan
2019-01-05 20:09:02 err dhcp_host_domain_ng.py[]: Kresd socket failed:<class 'socket.error'>,[Errno 111] Connection refused
2019-01-05 20:09:02 err dhcp_host_domain_ng.py[]: Wrong host format '/mnt/ssd/kresd/hints.tmp' in host file 192.168.10.61 Samsung_TV.lan
2019-01-05 20:09:02 err dhcp_host_domain_ng.py[]: Kresd socket failed:<class 'socket.error'>,[Errno 111] Connection refused
2019-01-05 20:09:02 err dhcp_host_domain_ng.py[]: Wrong host format '/mnt/ssd/kresd/hints.tmp' in host file 192.168.10.67 openHABian.lan
2019-01-05 20:09:02 err dhcp_host_domain_ng.py[]: Kresd socket failed:<class 'socket.error'>,[Errno 111] Connection refused
2019-01-05 20:09:02 err dhcp_host_domain_ng.py[]: Wrong host format '/mnt/ssd/kresd/hints.tmp' in host file 192.168.10.62 echo_dot.lan
2019-01-05 20:09:02 err dhcp_host_domain_ng.py[]: Kresd socket failed:<class 'socket.error'>,[Errno 111] Connection refused

It seems that all hosts that get reportes are in the ‘.lan’ domain which is configure in foris to be the domain for local machines. I have ~30 of them and oll of them are in the /mnt/ssd/kresd/hints.tmp file. If I rename it, it get’s recreated with the same contents. It’s actually just consisting of ‘ip’‘hostname’ and even if it’s auto-created kresd complains about it. Whats broken here?

also theres another file ‘/tmp/dhcp.leases.dynamic’ which causes the exact same errors (interestingly also only for ".lan’ hosts.

2019-01-05 23:20:03 err dhcp_host_domain_ng.py[]: Wrong host format '/tmp/dhcp.leases.dynamic' in host file 192.168.10.52 Grandstream-HT503.lan
2019-01-05 23:20:03 err dhcp_host_domain_ng.py[]: Kresd socket failed:<class 'socket.error'>,[Errno 111] Connection refused
2019-01-05 23:20:03 err dhcp_host_domain_ng.py[]: Wrong host format '/tmp/dhcp.leases.dynamic' in host file 192.168.10.38 Diskstation.lan
2019-01-05 23:20:03 err dhcp_host_domain_ng.py[]: Kresd socket failed:<class 'socket.error'>,[Errno 111] Connection refused
2019-01-05 23:20:03 err dhcp_host_domain_ng.py[]: Wrong host format '/tmp/dhcp.leases.dynamic' in host file 192.168.10.170 iPadvonikeRisse.lan
2019-01-05 23:20:03 err dhcp_host_domain_ng.py[]: Kresd socket failed:<class 'socket.error'>,[Errno 111] Connection refused

Here’s the contents of the ‘/mnt/ssd/kresd/hints.tmp:

192.168.10.34 Desktop.lan
192.168.10.52 Grandstream-HT503.lan
192.168.10.37 Brother_HL_3150CDW.lan
192.168.10.38 Diskstation.lan
192.168.10.39 FreePBX.lan
192.168.10.43 iTach_IP2IR.lan
192.168.10.46 RX-V675.lan
192.168.10.50 nuctux.lan
192.168.10.51 KMTronic_Webrelay.lan
192.168.10.44 ReadyNAS2.lan
192.168.10.49 Zappiti.lan
192.168.10.53 ReadyNAS3.lan
192.168.10.54 Raspberry_PI.lan
192.168.10.55 Raspberry_PI_WLAN.lan
192.168.10.48 Mede8er-X3D.lan
192.168.10.56 NTP_WIFI_TIME_SYNC.lan
192.168.10.36 Remoteboot.lan
192.168.10.57 Grandstream-HT503_2.lan
192.168.10.58 Raspberry_PI_3.lan
192.168.10.59 Raspberry_PI_3_WLAN.lan
192.168.10.61 Samsung_TV.lan
192.168.10.67 openHABian.lan
192.168.10.62 echo_dot.lan
192.168.10.63 Diskstation_LAN2.lan
192.168.10.64 Raspberry_Pi_Desktop.lan
192.168.10.65 Raspberry_Pi_Desktop_LAN.lan
192.168.10.60 odroid.lan
192.168.10.66 Raspberry_Pi_Scanner.lan
192.168.10.68 Raspberry_Pi_W.lan
192.168.10.69 Raspberry_Pi_ModMyPi.lan
192.168.10.175 Portal.lan
192.168.10.70 Galaxy_S8.lan
192.168.10.71 Galaxy_S9.lan
192.168.10.50 ad1.ad.shadowsrealm.ch.lan
192.168.10.59 aircontrol.lan
192.168.10.1 turris.lan
192.168.10.175 portal.lan
192.168.10.64 ccu2.lan

The other file with the same problem /tmp/dhcp.leases.dynamic looks like this:

192.168.10.151 android-4a56d7f20694a042.lan
192.168.10.39 FreePBX.lan
192.168.10.60 odroid.lan
192.168.10.46 RX-V675.lan
192.168.10.53 ReadyNAS3.lan
192.168.10.132 DS600-19C0C7.lan
192.168.10.175 Portal.lan
192.168.10.134 S850A-GO.lan
192.168.10.155 Chromecast.lan
192.168.10.52 Grandstream-HT503.lan
192.168.10.38 Diskstation.lan
192.168.10.170 iPadvonikeRisse.lan

The /tmp/kresd.config looks like this:

–Automatically generated file; DO NOT EDIT
modules = {
‘hints > iterate’
, ‘policy’
, ‘stats’
, predict = {
window = 30 – 30 minutes sampling window
, period = 24*(60/30) – track last 24 hours
}
}
hints.use_nodata(true)
cache.ns_tout(5000)
hints.config(‘/mnt/ssd/kresd/hints.tmp’)
net.bufsize(4096)
net.ipv4=true
net.ipv6=false
cache.open(1*GB)
policy.add(policy.all(policy.FORWARD({
‘54.93.173.153’,
‘81.17.17.170’,
})))

— Included custom configuration file from: —
— /etc/kresd/custom-forwarding.conf
local ad_rule = policy.add(policy.suffix(policy.STUB(‘192.168.10.50’), {todname(‘ad.shadowsrealm.ch’)}))
policy.del(ad_rule.id)
table.insert(policy.rules, 1, ad_rule)
local lan_rule = policy.add(policy.suffix(policy.STUB(‘127.0.0.1@5353’), policy.todnames({‘lan’,‘168.192.in-addr.arpa’})))
policy.del(lan_rule.id)
table.insert(policy.rules, 1, lan_rule)

as you might gess I have dnsmasq running on port 5353 for local resolution. Interestingly when I restart kresd I get an error ‘uci:entry_not_found’. This is the output of /etc/init.d/resolver restart:

Called /etc/init.d/kresd stop
set dhcp script
Called /etc/init.d/kresd start
set dhcp script
uci: Entry not found
Called /etc/resolver/dhcp_host_domain_ng.py

And since 3.11.1 it even got that worse that the dns check in ‘foris’ is now broken as well. No matter what config I select there and if it’s fowarding or not I get 2 red cross icons as “DNS: failed” and “DNSSEC: failed” but as router is more or less working normally (except those thousands lines of errors) I don’t know why foris thinks that dns is failing.

Any help here would be greatly appreciated as I think writing thousands of lines of code is really a bad sign and I would like to get foris dns check up and working again and maybe also fix this ‘uci: entry not found’ error. I’m really out of ideas and I couldn’t find any solution here in the forum. There was another case reported here with 3.11 but no solution given. It also seemed that it was fixed with 3.11.1 whereas my situation got worse with foris dns check no failing as well.

UPDATE:
It seems the culprit of all this is “/etc/resolver/dhcp_host_domain_ng.py”. It causes the ‘uci: not found’ call if you don’t have all values set in /etc/config/resolver for static & dynamic leases. And it also causes kresd to fail to add the hints. It makes a call “hints.add_hosts(‘/mnt/ssd/kresd/hints.tmp’)” which creates all this log-output because kresd doesn’t seem to like this file. So either file format with just ip & hostname is wrong or the command. Would be great to find out.

And can someone please post the latest version of “/etc/resolver/dhcp_host_domain_ng.py” for 3.11.1. I have the feeling that mine might be outdated or why else do other people not have these problems?

UPDATE2:
Turns out that if I disable dynamic leases option in /etc/config/resolver the errors disappear because the whole python script is not called anymore. So it seems that dynamic leases are badly broken as this scripts does nothing then bullshit, but I still would like to know what actually went wrong. Why does the add_hosts command fail so badly?

UPDATE3:
This option is actually set by ‘foris’ when you opt to have a custom domain for local devices. It then sets the dynamic leases option to ‘1’ causing all this errors

REMAINING: Still have the problem that dns check in foris reports 2 x fail for whatever reason.

1 Like

Should be merged into the other collecting thread for DNS/kresd- issues.
Actually that didn’t start with 3.11.1 but with 3.11.

regarding the “format” issue, somewhere i found note from some user (i am really sorry i can’t find that post anymore), where he mentioned, that using “upper” case in hostnames in hints.tmp file cause that “format” issue.

(so i went and changed all hostnames in network to be lower case) and will see after next reboot)

also in parallel , someone noted that using “localdomain.lan” has some glitches … expansion of domain name somewhere is needed and somewhere not …// shortly question is “use fullname in hostname setup in luci or not?” (same for dhcp in luci). and later “will that really work?” , because i think that related python script is not perfect…

Thanks, for this note. I think that Foris setup and Luci are sometimes in logical conflict causing all those dns/dhcp issues.

That “format” error in log is since i remember (OP and me reported that quite often, each TOS update, but still seems to be low priority for devs :slight_smile: and thread were closed, so it is like neverending story). I really do not understand why it is like it is, why not make it KISS (keep it simple stupid) way. A lot of users will be so happy to have nice. There are so many guides/howto post and maybe even more about issues with resolving and such.

So issue with socket solved, dnsmasq(port 0), kresd(port 53) …, but if there was some forced update i was missing -opkg version of dhcp uci config. Nevermind.
I made revision of hostnames (in Luci and in /etc/hosts), made all lower case (dhcp.leases, dhcp.leases.dynamic, hints.tmp are now having all lower case)
I made some changes in /etc/resolver/dhcp_host_domain_ng.py (mainly /tmp/kresd changed to /srv/kresd ; also some were leading to non existing files (in default setup it is fine).

Now only four hostnames are having “format” error(those not yet in luci/network/hostnames). Will see after the lease time expires if refresh/del-add will work or not.

Also the log message is maybe not fully right:
log: Wrong host format '/srv/kresd/dhcp.leases.dynamic' in host file xxx.xxx.xxx.xxx host.domain.lan
code: log("Wrong host format '%s' in host file %s " % (filename, line), LOG_ERR) – i would expect first %s to be entry from file and second file itself. Maybe that split line is not working correctly ( host = line.strip().split()[1] )? And instead of entry from file, filename si given to hints.del routine (or maybe …
UPDATE: from Kresd documentation , hints.del is accepting “pair” as string (ie: ‘hostname address’) or just ‘hostname’) … the format in dhcp.leases.dynamic file is in “address hostname” …

UPDATE2: seems that “host” variable/value is correctly parsed, but not all of them pass the part “allowed” check …

notes

allowed = re.compile("(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)

Each element of the hostname must be from 1 to 63 characters long and
the entire hostname, including the dots, can be at most 253
characters long. Valid characters for hostnames are ASCII(7) letters
from a to z, the digits from 0 to 9, and the hyphen (-). A hostname
may not start with a hyphen.

note: “_” is not officially suported, but can be used …

my small change : (?!-)[a-zA-Z\d_-]{1,63}(?<!-)
i also changed log mesage a bit:

log("Host '%s' in wrong format from entry '%s' in lease file '%s'" % (host,line,filename), LOG_ERR)

That bug was solved very long time ago (2017; this thread started in 2019) [SOLVED] Hints not working on Kresd - SW bugs discussion - Turris forum

So far the only reason I repeatedly see for the “check failed” lines is usage of underscores (“_”). Those aren’t allowed by the check, as you noticed. I have a WIP from this month to loose these restrictions a bit. I couldn’t find a newer reference saying these characters are allowed in “host names”, but apparently underscores are commonly used and I can’t see why always block them in this script (if DHCP let it through, it’s too late IMO).

Yeah i lately noticed that really it is only three hosts having allowed=false (and all three with “-” in name). So for those it is fine, aside i made all my host in lower case.
Anyway, add/dell/refresh of dynamic lease using that refresh script is still failing due “wrong format” ), so i am still wondering what is the correct format. or if the parsing is still falining for dynamic hosts.

yeah, it was late :slight_smile:

@vcunat
… so now with my small update in that regexp in allowed routine, all hostnames pass “allowed” check now, but i am still getting this socked failed message

(+ wrong format; + script is looking for file in /tmp instead of in /srv … as i have symlink , i am fine, but it will be nice to take that path from dhcp uci config file )

initially i thought it was caused by the ports 0 vs 53 , but i think i have it right (resolver>common running on 53 with preffered resolver ‘kresd’) and dhcp(dnsmasq on port 0).

Can I check-and-change something to mitigate that error ? (or i have to wait for next update ?).
Aside, do we know what is the “correct format” for that dynamic lease file ? :slight_smile: (if the host si valid and with allowed characters … (if i disable that script completely, is there a downsize ?)

I can’t see such error on my Omnia. I’d probably start by checking ls /tmp/kresd/tty/ that it contains exactly one file (it’s a socket, to be precise).

If you disable the script, you don’t get the DHCP-in-DNS function. (I’m not sure why you ask; there’s a checkbox in Foris for that.) The script doesn’t just create the file, it also feeds the data into the resolver.

I have quite a lot of files in /srv/kresd/tty, should i remove the old ones and keep latest? (or is there another solution?) … as i mentioned below i am not so familiar with “resolving” …

I know how to enable/disable that, but that will make "dynamic domains ‘0’ " to be disabled together (which i want to keep ‘1’). For me resolving and (via dig DS,DNSKEY,+dnssec) are working nice for internet hosts as well for local ones. So in general i am not sure if i have something wrong or not , how to fix it :slight_smile: ) that’s why i have sometimes silly questions all around the forum :slight_smile:

Oh, you moved this directory to persistent storage? When kresd isn’t shut down properly, noone cleans the file (on Turris) and it remains there. The DHCP script isn’t written robustly enough to handle that. Reboot cleans /tmp and kresd itself isn’t (currently) known to crash, so normally it shouldn’t be a problem.

In any case, you want to clean this – just keep the last one; you can check its name equals the output of pidof kresd.

In case kresd (occasionally) crashes, that would probably be interesting to me, too. But on persistent storage the garbage files will remain e.g. after power-cycling (I think).

Yeah i’ve moved it, coz i had some issues with filled /tmp mount, so most logdir/rundir locations are there. sometimes i have fallback symlink in expected location. It cause some more issue, but i can live with that.

Aside tty housekeeped, now just one file is present, with pid of kresd as name. Suprisingly no more socket errors in messages and seems that refresh of lease is now working(resp. no more “format” or/and “socket” errors). … i will monitor it for a while.

This error means that the client cannot connect to the port on the computer running server script. This can be caused by few things, like lack of routing to the destination or you have a firewall somewhere between your client and the server - it could be on server itself or on the client etc. Note that a server must perform the sequence socket(), bind(), listen(), accept() (possibly repeating the accept() to service more than one client), while a client only needs the sequence socket(), connect(). Also note that the server does not sendall()/recv() on the socket it is listening on but on the new socket returned by accept(). Try the following:

  • Check if you really have that port listening on the server (this should tell you if your code does what you think it should): based on you OS, but on linux you could do something like netstat -ntulp
  • Check from the server, if you’re accepting the connections to the server: again based on your OS, but telnet LISTENING_IP LISTENING_PORT should do the job
  • Check if you can access the port of the server from the client , but not using the code: just us the telnet (or appropriate command for your OS) from the client

since that last post it is working and i have checker script which make some housekeeping in tty if neded (including service restart) :), and as I mentioned above(or maybe somewhere else on this forum) my problems with ports were caused by foris vs luci , each one tries to put own “option port xx” but not removing the concurent entry …i made some cleanup and i think i have it working .

1 Like

as i lost my previous script for housekeeping

here is new variant
export kresd_dir=/srv/kresd/tty 
export kresd_pid=$( pgrep kresd ) 
if [[ -n $kresd_pid ]] 
then 	
	echo "Active Kresd TTY $kresd_pid found!" 
	{ find $kresd_dir ! -name "$kresd_pid" -type s -exec rm {} + ; } && echo "Old Kresd TTY files were removed."
else 
	echo "No active Kresd TTY found."	
fi

where i simply check for existing and active socket/tty , remove all but that one active …

Nitpick: I’d probably use pidof instead of pgrep. It’s less error-prone; e.g. it won’t match the script itself if it has “kresd” as part of its name. (-name "$kresd_pid" won’t work well with multiple PID matches)

1 Like

noted, thanks.

i kind of assume there is one or none active kresd process ; handling more , chm…, yep, good point.

Yes, practically I think it’s only an issue in combination with those “self-matches”.

fyi: … after last TOS/pkgupdate i found that my /srv/kresd/tty is empty and there is /srv/kresd/control instead. Not big deal, i will just change the path in my housekeeping script.

1 Like