Kresd (knot-resolver) in a crash loop and DNS stopped working (a bug?)

For some reason my kresd (knot-resolver) started crashing consistently. Eventually DNS stopped working.
The logs show an Assertion failed exception. Looks like a bug to me. Can anyone help me out?

Here are the logs:

Jul  9 11:54:56 turris kresd[11470]: [ ta ] warning: overriding previously set trust anchors for .
Jul  9 11:54:56 turris kresd[11470]: [ta_update] refreshing TA for .
Jul  9 11:54:56 turris kresd[11470]: [ta_update] key: 20326 state: Valid
Jul  9 11:54:56 turris kresd[11470]: [ta_update] next refresh for . in 24 hours
Jul  9 11:55:01 turris kresd[11470]: Assertion failed: xn <= n (gmp-glue.c: _nettle_mpz_limbs_copy: 178)
Jul  9 11:55:01 turris crond[11477]: (root) CMD (/usr/bin/notifier)
Jul  9 11:55:01 turris crond[11476]: (root) CMDOUT (There is no message to send.)
Jul  9 11:55:06 turris kresd[11507]: [ ta ] warning: overriding previously set trust anchors for .
Jul  9 11:55:06 turris kresd[11507]: [ta_update] refreshing TA for .
Jul  9 11:55:06 turris kresd[11507]: [ta_update] key: 20326 state: Valid
Jul  9 11:55:06 turris kresd[11507]: [ta_update] next refresh for . in 24 hours
Jul  9 11:55:11 turris kresd[11507]: Assertion failed: xn <= n (gmp-glue.c: _nettle_mpz_limbs_copy: 178)
Jul  9 11:55:16 turris kresd[11519]: [ ta ] warning: overriding previously set trust anchors for .
Jul  9 11:55:16 turris kresd[11519]: [ta_update] refreshing TA for .
Jul  9 11:55:16 turris kresd[11519]: [ta_update] key: 20326 state: Valid
Jul  9 11:55:16 turris kresd[11519]: [ta_update] next refresh for . in 24 hours
Jul  9 11:55:22 turris kresd[11519]: Assertion failed: xn <= n (gmp-glue.c: _nettle_mpz_limbs_copy: 178)
Jul  9 11:55:22 turris procd: Instance kresd::instance1 s in a crash loop 6 crashes, 5 seconds since last crash

Here is my configuration

root@turris:~# cat /etc/config/resolver
config resolver 'common'
        list interface '0.0.0.0'
        list interface '::0'
        option port '53'
        option keyfile '/etc/root.keys'
        option verbose '0'
        option msg_buffer_size '65552'
        option msg_cache_size '20M'
        option net_ipv6 '1'
        option net_ipv4 '1'
        option forward_upstream '0'
        option prefered_resolver 'kresd'
        option ignore_root_key '0'
        option prefetch 'yes'
        option static_domains '1'
        option dynamic_domains '0'
        option edns_buffer_size '1232'

config resolver 'kresd'
        option rundir '/tmp/kresd'
        option log_stderr '1'
        option log_stdout '1'
        option forks '1'
        option keep_cache '1'
        option include_config '/etc/kresd/custom.conf'
        list hostname_config '/etc/hosts'


root@turris:~# cat /etc/kresd/custom.conf        
-- make sure to include this config with                               
-- uci set resolver.kresd.include_config=/etc/kresd/custom.conf                
-- see https://knot-resolver.readthedocs.io/en/stable/modules-policy.html 
                                                                                                                                                                                       
trust_anchors.add_file("/etc/kresd/kresd_root.keys", readwrite)
                                                                                                                                                                                       
cache.ns_tout(1000)                                                                                                                                                                    
                                                                                                                                                                                       
-- DNS over VPN
-- Forward all dns queries to a server behind my VPN
policy.add(policy.all(
       policy.FORWARD({'192.168.2.13'})
))
1 Like

I don’t get why you’d want this line, as the trust anchors are already handled by default. For now I have no idea why the assertion might happen and I hear about it for the first time, but removing that config line might help as a workaround.

EDIT: by the way, readwrite here is usage of undefined identifier, i.e. nil as defined by lua.

You are right that line is not needed. I have removed it and started kresd. It works for 10 minutes and than starts crashing again:

Jul  9 13:04:21 turris kresd[16255]: [ta_update] refreshing TA for .
Jul  9 13:04:22 turris kresd[16255]: [ta_update] next refresh for . in 13.691666666667 hours
Jul  9 13:05:01 turris crond[16316]: (root) CMD (/usr/bin/notifier)
Jul  9 13:05:01 turris crond[16315]: (root) CMDOUT (There is no message to send.)
Jul  9 13:10:01 turris crond[16648]: (root) CMD (/usr/bin/notifier)
Jul  9 13:10:02 turris crond[16647]: (root) CMDOUT (There is no message to send.)
Jul  9 13:12:01 turris crond[16801]: (root) CMD (/usr/sbin/logrotate -s /tmp/logrotate.state /etc/logrotate.conf)
Jul  9 13:14:07 turris kresd[16255]: Assertion failed: xn <= n (gmp-glue.c: _nettle_mpz_limbs_copy: 178)
Jul  9 13:14:12 turris kresd[16933]: [ta_update] refreshing TA for .
Jul  9 13:14:13 turris kresd[16933]: [ta_update] next refresh for . in 13.609583333333 hours
Jul  9 13:14:17 turris kresd[16933]: Assertion failed: xn <= n (gmp-glue.c: _nettle_mpz_limbs_copy: 178)
Jul  9 13:14:22 turris kresd[16943]: [ta_update] refreshing TA for .
Jul  9 13:14:22 turris kresd[16943]: [ta_update] next refresh for . in 13.608194444444 hours
Jul  9 13:14:28 turris kresd[16943]: Assertion failed: xn <= n (gmp-glue.c: _nettle_mpz_limbs_copy: 178)