Swich ports offline if plugged during boot

Great @davidhaluska, thanks for your efforts! I have not yet received a reply from Turris on my report of the same issue (and a few others). Good to know that someone is home over there.

I have the same switch and the same problem… I have had to disable automatic updates as I’m out of home often and need remote access to my servers here when I’m out. The automatic reboot caused my systems to become unavailable at random intervals. I’d really like to turn automatic updates back on for security reasons. Please, let us know when the update with the fix is out!

Hi guys! The kernel patch that actually does almost the same thing as adminX’s fix-switch.c is going to the test branch (https://gitlab.labs.nic.cz/turris/openwrt/commit/22c8417eae25dd1d259baf393a12fda53ac24c0d) and it will be included in the next release as a hot bugfix.

Many thanks for efforts you all put to narrowing down and resolving this problem. Even though it is such a simple thing as resetting each PHY in the switch chip might be, it took quite a long time for us to be able to reproduce the problem and debug it finally and it would be even longer without your help. So thank you again!

Tomas

5 Likes

And we have testing image https://api.turris.cz/openwrt-repo/omnia-dev-tms/medkit/omnia-medkit-latest-minimal.tar.gz

(You can write this image to the router from USB disk using standard method described https://www.turris.cz/doc/en/howto/omnia_factory_reset . Please, replace the image with the mainline (https://api.turris.cz/openwrt-repo/omnia/medkit/omnia-medkit-latest-full.tar.gz) once we have the fix released, otherwise the updates will not work properly, if you keep using the testing base image.)

We would be glad if you could test the image to see whether it really fixes all the switch reset related problems.

Thanks again!

brill–if I flash using the medkit, after I’m done testing the fix, can I still rollback to one of my prior snapshots? I have a working configuration (that uses fix-switch) that I’d like to return to after testing your fix.

Hi! Unfortunately no… Flashing the router from USB completely wipes the eMMC and replaces the factory defaults snapshot and all other snapshots as well.

Well, maybe you can try to put the dev-tms branch (that contains the testing kernel) to your updater sources, which might be less intrusive and you will be able to return to the snapshot afterwards. I’ll try it here and write down short how-to.

Cheers, Tomas

Theoretically you could try to “btrfs send” to backup root device to USB stick and then restore it with “btrfs receive” after testing. It is high risk operation.

If you only want to save the configuration then backing up /etc/config and /etc/updater is enough. And Foris UI has backup and restore functionality in 3.4 to help backup those directories but I have understood there are some bugs in restore part of it.

Tomas/Turris team, did this fix make it into the upcoming 3.5 release (which is coming out Thursday). While I can use the fix-switch binary to get around it, it would be great to have the fix built in.

No, it is not tested enough to be pulled. However it is going to be in nightly today or tomorrow. So it goest to the next release…

I have more to back up than just configuration so snapshot is easier for me. Tomas I will keep an eye out for this to hit nightly once it does will switch to that branch, test thoroughly and rollback via snapshot. As I understand the GitLab activity will show when this moves to test branch, is that correct?

It was already moved into test (= nightly) branch, but it depends when it will be compiled. :slight_smile:

Pepe yes I saw that in GitLab.

Hi - did the fix make it into last night’s 3.5 release?

No it did not, see user brill’s comment above:

However, I did pull the January 13th nightly onto my Omnia (which contains the fix for this) and tested it. It does fix the problem (I rebooted and power cycled the router multiple times and all ports came up, when that wasn’t the case for me before without the fix-switch binary), so it looks likely that the final fix will be in the 3.6 release.

1 Like

The fix for this is now incorporated into the 3.5.2 release which was pushed today. Confirmed that it resolved the issue for me (after I removed the fix-switch workaround)

2 Likes

I am not sure if this is related. But when there is a power loss, one of my devices has no link after boot. I have to physically disconnect the cable, usually on both ends. The problem appeared when I moved from Turris 1.0 to Omnia. I and is consistent across the ethernet ports on Omnia. No other device has a problem.