Request for new core functionaliy of schnapps/updater

Hi,

I think there are some more guys like me, who own or at least administrate a TO, that is quite some physical distance away from his/her normal neighborhood. We administrate our TO via VPN.

Obviously one of the strengths of TO is the updater functionality.
Obviously this strenght is one the weakest points, when it leads to a new bug introduced with an installed update, that breaks the currently used VPN functionality.
As this may happen with nearly any update, it cannot be mitigated.
So using the update functionality while being only connected via VPN may result in losing access to the router.

So what I would like to request, is the option to have an automatic rollback if the TO gets no feedback in a defined time frame stating that the VPN connection still works.
This could be envisioned just like the 15 second pop up in Windows when changing the screen resolution: When the button for “keep this settings” is not pressed, the screen resolution is reset the the previous settings (which is known to fully work).
If TO does not get this feedback (maybe a via button in the updater page), it should do a rollback to the pre-update schnapps image.

I’m looking forward for your comments,
Ssdnvv

Hello

First of all my experience is that I never ever lost ssh access trough VPN because of automatic updates and I am rocking test branch on every single one of my routers. I am not saying that updates are always flawless but for some reason I don’t have problems on my own. I fully trust automatic updates.

Now to core of your suggestion. In general it makes sense. The idea is to approve that there is a valid connection trough VPN and otherwise rollback changes. I see two problems in making this general solution. Expecting that user is going to be available and able to click some button in Foris in X amount of minutes after update is pretty much unrealistic. Second problem is that in such case when we rollback snapshot we have to reboot router which is not exactly expected behavior unless user knows how this update and verify functionality works.
I have a non-systematic solution for you. Updater has /etc/updater/hook_postupdate. You can add script there which checks VPN functionality and when problem is detected it rollbacks update, disables updater and sends notification. The detection can be something like ssh access trough some VPN host back to router. This is not ultimate solution because you need some configured host on VPN network that is always alive but it should be somewhat functional.

1 Like

Thanks for your quick answer.
I already thought of the postupdate hook, but that can get quite complex because there might be several issues, that cannot be seen by standard methods like ping <standard route for VPN>. I’m writing this just because I exactly had that problem introduced with one of the updates after 3.10.3 (cannot tell exactly which), which broke my OpenVPN-connection: connection got/gets established and ping was/is working, but I still cannot access the remote network. And there might be plenty of other combinations of issues. [I gave up on OpenVPN and will in the course of the next days try wireguard].

For the method itself I was not talking about automatic update, which I do not trust by any matters - when I had automatic updates enabled it always broke my network. Every single time. Right until you and your colleagues introduced update approvals :wink:
I thought about the following procedure:

  1. Start the update manually either via cli or foris
  2. Wait until update is done
  3. Reboot (which is at the moment not done automatically, so you need to wait while being connected via VPN until update is done anyway)
  4. After that VPN will pause and try to reconnect
  5. If this reconnection worked out, you should press the respective Foris button/run cli command for telling TO that connection is still working.

edit: I really appreciate your work. Having a updater functionality is unique, but it is complex. That’s why I am requesting this feature.

I think that the point is not to just ping some address but to simulate whole router access trough ssh. If you are able to access some remote client on VPN (or even outside of VPN) that is able to do ssh back on router trough VPN then you are sure that there is at least one machine that you can access that was able to access router after an update. This is what I meant by detection. It ensures that if this test passes that your network works at least to state that you are able to fix it and that means that rollback is not necessary.

I think that we probably won’t implement anything like that in close future simply because we pursuit automatic updates not manual updates. This feature does not makes sense with automatic updates as you noted. In general I don’t see why you could not when you are doing manual updates do this on your own. You can just script it.

Second point I think is that the question is why you have so much problems with network with updates. Is it possible that your configuration is not correctly configured and that it is commonly overwritten by updates? I think that you should invest more time in configuration. You should ensure that everything works even after reboot and also that you did not edited any file that is going to be overwritten by next update. First check should be clear. Second check can be done by pkg_check command. Note that pkg_check reports some of the files that are correctly modified.

Edit: Yes updater is complex. I won’t argue with that and it is also true that almost no one understands it completely. The problem is if it makes sense in combination of updater. Updater after all is something else than apt and using it as such is possible but it is not completely intended. Question is should I invest time on features limiting automatic updates or features improving automatic updates? Simply question is, isn’t it out of the scope and wouldn’t be better to solve core problem?

1 Like

When you’re about to try the update manually, I’d just script launching a screen with “sleep 600; schnapps rollback; reboot” or something like that. If you manage to connect to the TO after the update, you just kill the screen. If not, 10 minutes later, your access is (hopefully) restored.