BIND server hangs after some time (1 day avg)

Hi!
I’ve been using BIND as a dns server in my turris omnia for some time, but after the 5.0 update, it completely freezes after a while. (I can’t even terminate it, I have to KILL it). I’ve tried to run it with verbose debugging and I didn’t see any obvious errors.

Anyone else here having the same problem? Any ideas how to debug a little more into it?

Thanks in advance!

anything in logs?
can you install strace and run

strace -p

?

Hi @Surakus,

Can you please tell us more details about your setup of bind, if it is possible? Also, it would be helpful if you are using Turris Omnia/Turris MOX even both are using the same target, which is mvebu, there might be something different in the source code. The version of the bind, which you are using is 9.14.12 am I correct?

All of these details help us to reproduce this issue.

Nothing in the logs :frowning: This is the strace experiment:

strace: Process 4552 attached
rt_sigtimedwait([HUP INT TERM], (this was the only output until it stopped responding)

Then I sent several SIGHUP to see what happened

{si_signo=SIGHUP, si_code=SI_USER, si_pid=2117, si_uid=0}, NULL, 8) = 1 (SIGHUP)
rt_sigtimedwait([HUP INT TERM], {si_signo=SIGHUP, si_code=SI_USER, si_pid=2117, si_uid=0}, NULL, 8) = 1 (SIGHUP)
rt_sigtimedwait([HUP INT TERM], {si_signo=SIGHUP, si_code=SI_USER, si_pid=2117, si_uid=0}, NULL, 8) = 1 (SIGHUP)
rt_sigtimedwait([HUP INT TERM], {si_signo=SIGHUP, si_code=SI_USER, si_pid=2117, si_uid=0}, NULL, 8) = 1 (SIGHUP)
rt_sigtimedwait([HUP INT TERM], {si_signo=SIGHUP, si_code=SI_USER, si_pid=2117, si_uid=0}, NULL, 8) = 1 (SIGHUP)

And then a SIGTERM

rt_sigtimedwait([HUP INT TERM], {si_signo=SIGTERM, si_code=SI_USER, si_pid=2117, si_uid=0}, NULL, 8) = 15 (SIGTERM)
futex(0xb69d2d88, FUTEX_WAIT_PRIVATE, 1, NULL) = ?

After this I could only SIGKILL it.

This is a Turris Omnia, running Bind 9.14.12

It’s set up to delegate my local domain to a different server, and everything else to kresd on a different port.

The configuration is exactly the same it was before the update to Turris OS 5.0.

I see that 9.14.12 is end-of-life and it comes from OpenWrt 19.07 version, I will update it there and once it is merged there, it will be part of the upcoming fixup release of Turris OS 5.0.

I opened a pull request to update BIND in OpenWrt 19.07 yesterday.

You might want to track there any status regarding it. If it gets merged, it will be part of the upcoming updates of Turris OS 5.0.

Sounds good :slight_smile: Can I get the new package somehow? (I’m trying to build it myself, but I’ll take some time untill I figure it out :smiley: )

I think building is explained on https://gitlab.nic.cz/turris/turris-build/-/blob/hbk/README.adoc

I managed to build it, testing it now, we’ll see if it hangs again :slight_smile: I’ll keep you posted.

After 11 days of testing I can say that this has not happened again :slight_smile:

Hi @Surakus,

Thank you for testing it and letting us know that it helped in your case. I merged my pull request and it will be included in the upcoming version of Turris OS 5.0.4.

It will be better to let know BIND developers about it, but as the series 9.14.x is not supported anymore, I think it does not make any sense, but if you have a different opinion, feel free to do that.