Restart services like OpenVPN/LXC on network restart?

davidjb · March 9, 2019, 1:13pm

I run a number of network-based services within LXC containers on my Turris Omnia and I’ve found that they stop being able to communicate with the network if the TO’s network is restarted. OpenVPN’s routing also gets affected, requiring it be restarted in order to start functioning again. Restarting the network entirely is pretty common in Foris now when changing settings (LAN/WiFi/network/VPN etc) see the network restarted for the changes to take effect.

In order for the services to regain network access, I need to stop and then start the containers, or restart OpenVPN, which is particularly challenging as I need to actively remember to do it. If accessing remotely or happen to forget, leaves the possibility of having the containers network-less or lacking VPN access until I can fix the problem.

Ideally, it feels like this something that should be built-in to Turris to gracefully restart services and LXC that have a dependency on the network. Or alternatively, said services could be made more resilient to a network restart, which might be more fiddly. That said, if this sort of ability doesn’t exist, is there a location for network-related event/hook scripts so I can at least manually automate the restarts to bring services back online?

AreYouLoco · March 11, 2019, 1:50pm

What kind of services you run? I don’t have experience with OpenVPN yet but I am using some connection checking script on my Omnia.

It is pinging 3 websites and if it fails few times then it restarts the connection. I am sure you can modify it a bit and place inside the container running to check if there is ping. If not then just restart the service instead of network interface in my case.

script.sh

#!/bin/sh
# Enter the FQDNs you want to check with ping (space separated)
# Script does nothing if any tries to any FQDN succeeds
FQDN="www.google.com"
FQDN="$FQDN wiki.openwrt.org"
FQDN="$FQDN www.turris.cz"
`# Sleep between ping checks of a FQDN (seconds between pings)` `SLEEP=3 # Sleep time between each retry` `RETRY=5 # Retry each FQDN $RETRY times` `SLEEP_MAIN=15 # Main loop sleep time`
check_connection()
{
for NAME in $FQDN; do
for i in $(seq 1 $RETRY); do
ping -c 1 $NAME > /dev/null 2>&1
if [ $? -eq 0 ]; then
return 0
fi
sleep $SLEEP
done
done
# If we are here, it means all failed
return 1
}
while true; do
check_connection
if [ $? -ne 0 ]; then
#command to run if pinging fails
%YOUR_COMMAND_HERE%
fi
sleep $SLEEP_MAIN
done

Take a look. You just have to have command to restart OpenVPN service. I think restarting the whole container is a bit overkill and tho not necessary (depending on your services).

davidjb · March 13, 2019, 12:34pm

Thanks for the suggestions. Connection checking could be a workaround inside the LXC containers, if a reboot inside the container affects the container’s networking on the host in the same way that lxc-stop / lxc-start does. I’ll try this out and see if I can pin down exactly why the network or routing isn’t functional.

On the host side of things, a restart of the Turris Omnia’s network means that there’s only the briefest of moments of lost network connectivity. So in this case, a cron script on the host would have to be permanently pinging a target to notice a momentary loss in network, and that could be fraught with false-positives.

I know I could restart OpenVPN periodically (eg every hour or day) and I might do that in the meantime, but it would be ideal to have a non-hacky way of keeping the router’s services working after a network restart.

Twinkie · March 13, 2019, 1:52pm

inside LXC container you can archieve it by installation of monit

apt-get install monit

then you have to edit /etc/monit/conf-enabled files and create particular files for services that you want to monitor and restart

then you can google monit openvpn example and adjust it somehow for your needs

example of monit config for openvpn but there are more on the net :

check process vpn-network with pidfile /var/run/vpn-network.pid
start program = “/etc/init.d/openvpn start vpn-network.com”
stop program = “/etc/init.d/openvpn stop vpn-network.com”

check host tap0 with address 1.1.1.1
start program = “/etc/init.d/openvpn start vpn-network.com”
stop program = “/etc/init.d/openvpn stop vpn-network.com”
if failed
icmp type echo count 5 with timeout 15 seconds
then restart

great thing you can also monitor other daemons and make sure monit restart them when necessary

also you can install monit into turris/openwrt by opkg install monit as it is part of turris packages and even I could not find example of monit config for LXC I suppose with a bit googling or experimenting also lxc should be monitored and restarted by monit

monit got web interface that can be accessed via login/password on http://your_lxc_container_ip:2812 by default or check status from ssh console via monit status command

davidjb · March 24, 2019, 8:31am

Thanks for the suggestion – monit could be an option but it’s more or less a bandaid over the top of the actual problem where restarting network leaves the services is a broken state. Plus, you’d have to get monit to recognise failure and that’s may be hard for a half-functioning OpenVPN process (eg still running and connectable but failing to route).

Ideally, the OS would keep track of these dependencies so that restarting the network restarts the child dependencies in some manner – so I’ve opened https://gitlab.labs.nic.cz/turris/turris-os-packages/issues/343.

In the meantime, I’ve worked around the problem by amending my manual commands for restarting LXC/OpenVPN to the end of /usr/bin/maintain-network-restart so at least when I make changes to Wifi, network, etc in Foris, the router’s services come back okay.

Twinkie · March 24, 2019, 10:41am

Well if you think so. I would not call it band aid as I use monit to monitor either openwrt services itself and also 2 lxc containers - pihole & debian. It works perfectly for me. Monit got many buit in features how to detect broken or crashed services on network and protocol level in order to restart it. And if service is in good state there is no restart at all. From my point of you there is no better solution, but you probably looking for perfectly tailored solution for your situation that you may have to implement yourself.

I doubt opening issue will help as turris team is obviously busy with mox and TOS4 leaving even basic packages not upgraded for years in the situation when there are alternatives.

Pepe · March 24, 2019, 10:59am

This part is OT and it doesn’t belong to this thread. This was already discussed in this thread Discussion about TOS 3.x and 4.x - General discussion - Turris forum and we also told you in some thread, why it is and why it can not be done easily. There’s already released Turris OS 4.0 alpha for Turris MOX and Turris Omnia. If you’d like, you can try it. Any patches or pull requests are welcome. I don’t think, there’s no need to say it again and again because it doesn’t help us nor others. We’re aware of this state and we’re doing what we can.

Pepe · March 24, 2019, 10:59am

Thank you! We will look into it and see if there’s anything what we can do about it.

davidjb · March 24, 2019, 11:08am

Thanks very much @Pepe! Hope the detailed description of the issue helps – let me know if you need more detail. I can understand LXC is a little more ‘exotic’ in terms of configuration, but with OpenVPN having Foris integration, it has been confusing to find the OpenVPN server not routing correctly after changing network settings (like WiFi, which appears unrelated).

I think something like network hook scripts (triggered on calling /etc/init.d/network) might be a good middle ground, if something like that doesn’t already exist. For instance, the OpenVPN package (and any other packages) could install scripts to restart it on network restart and you could add your own script to restart anything else like LXC containers.

My current workaround isn’t overly great as it’s very specific but it’s at least a starting point to illustrate the issue.

peci1 · August 27, 2023, 11:36pm

I’ve updated turris-maintain: Network restart leaves net-related services in a broken state (#343) · Issues · Turris / Turris OS / Turris OS packages · GitLab with a new idea - manually adding the veth interface to the bridge resolves the problem. So, a hotplug script just doing this should fix the issue. We can either wait until somebody writes the generic hotplug script, or if your setup is fixed, you can just write a one-liner using brctl addif and your static network interface names.