TurrisOS 5.3.3 randomly reboots

Ever since upgrade to 5.3.3 (TurrisOS 5.3.3 f14bc5cf5635edbb3ab2e29c14a260e2640e588f r11388+90-f14bc5cf56) the router started randomly restarting seemingly for no reason. The logs don’t provide any useful information:

2022-01-13 16:43:48 err kernel[]: [140204.639680] mvneta f1034000.ethernet eth2: bad rx status 0fa10000 (crc error), size=1405
2022-01-13 16:44:01 info crond[19693]: (root) CMD (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:44:01 info crond[19692]: (root) CMDEND (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:44:37 err kernel[]: [140254.041845] mvneta f1034000.ethernet eth2: bad rx status 0f810000 (crc error), size=1468
2022-01-13 16:44:40 err kernel[]: [140256.386651] mvneta f1034000.ethernet eth2: bad rx status 0fa10000 (crc error), size=1405
2022-01-13 16:44:43 err kernel[]: [140259.292264] mvneta f1034000.ethernet eth2: bad rx status 0fa10000 (crc error), size=1405
2022-01-13 16:45:01 info crond[19788]: (root) CMD (/usr/bin/notifier)
2022-01-13 16:45:01 info crond[19789]: (root) CMD (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:45:01 info crond[19787]: (root) CMDEND (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:45:01 info crond[19786]: (root) CMDOUT (There is no message to send.)
2022-01-13 16:45:01 info crond[19786]: (root) CMDEND (/usr/bin/notifier)
2022-01-13 16:45:05 err kernel[]: [140281.883495] mvneta f1034000.ethernet eth2: bad rx status 4fa10000 (crc error), size=117
2022-01-13 16:46:01 info crond[19893]: (root) CMD (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:46:01 info crond[19892]: (root) CMDEND (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:47:01 info crond[19972]: (root) CMD (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:47:01 info crond[19971]: (root) CMDEND (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:48:01 info crond[20061]: (root) CMD (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:48:01 info crond[20060]: (root) CMDEND (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:49:01 info crond[20145]: (root) CMD (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:49:01 info crond[20144]: (root) CMDEND (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:50:01 info crond[20232]: (root) CMD (/usr/bin/notifier)
2022-01-13 16:50:01 info crond[20233]: (root) CMD (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:50:01 info crond[20231]: (root) CMDEND (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:50:01 info crond[20230]: (root) CMDOUT (There is no message to send.)
2022-01-13 16:50:01 info crond[20230]: (root) CMDEND (/usr/bin/notifier)
2022-01-13 16:51:01 info crond[20344]: (root) CMD (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:51:01 info crond[20343]: (root) CMDEND (/usr/bin/rainbow_button_sync.sh)
2022-01-13 16:52:33 info kernel[]: [    0.000000] Booting Linux on physical CPU 0x0
2022-01-13 16:52:33 notice kernel[]: [    0.000000] Linux version 4.14.254 (packaging@turris.cz) (gcc version 7.5.0 (OpenWrt GCC 7.5.0 r11388+90-f14bc5cf56)) #0 SMP Fri Dec 10 08:30:43 2021
2022-01-13 16:52:33 info kernel[]: [    0.000000] CPU: ARMv7 Processor [414fc091] revision 1 (ARMv7), cr=10c5387d
2022-01-13 16:52:33 info kernel[]: [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
2022-01-13 16:52:33 info kernel[]: [    0.000000] OF: fdt: Machine model: Turris Omnia
2022-01-13 16:52:33 info kernel[]: [    0.000000] Memory policy: Data cache writealloc

Does anyone have any idea where I should be looking to troubleshoot?

does that happen randomly or after ISP disconnection?
My ISP disconnect after 24 hours and omnia crashes at that time sometimes.

I believe it’s unrelated to ISP. I’m actually behind another router which runs continuously without issues.

I am on TurrisOS 5.3.2, Turris Omnia. My system crashes / reboots 3-4 times a week. I also observe that it could have something in common with ISP / Provider disconnects. I publish my logs to an external syslog server - but there is no real indicator.

Has someone an idea how to further troubleshoot the issue (crashlogs etc.)?

@mina86 is the router in front of the turris appliance responsible for the xDSL dial in or is it just an IP interface? The situation is really annoying because of remote work, many web session with customers etc. I do not have an reliable feeling to my turris appliance anymore. I really want to fix the issue and not switch to another solution because from the specs it is still a good system.

In my case I am behind a standard VDSL2+ modem, turris is responsible for the pppoe dial in. There is no strict time pattern regarding the reboots.

My crashlog is empty / file is not present

cat /sys/kernel/debug/crashlog
cat: can't open '/sys/kernel/debug/crashlog': No such file or directory

I remember, that there is a setting which tells the system to reboot, if an interface / wan etc. is down for X time|attempts, but I can not find it anymore - neither I think, that I set something up regarding this setting.

@fantomas do you have an idea what the issue could be with your ISP leading to reboot situation? I saw, that you posted your situation here (Reboot at night - why? - #22 by fantomas) - did the analysis of the buddyinfo provide you with valuable information?

My feeling is, that many users are facing those issues? But this should no be - this is one of the core functions of a router - a stable uptime.

In my case Omnia works as a pure IP router/switch.

if the crash causes sudden reboot, there’s literally no way to know it happened. perhaps serial console could display it, but I’m not sure.

the only way to found that the isp disconnect lead to crash was time when that happened.

Thank you both. 4 crashes in the last 4 days. Random times / no indication in the logs (at least for me).

I was one of the early supporters of the platform on indiegogo (23.11.15) - I really love the platform from the specs, history, ideas and values but it seems that I can not change the situation and have to move on. The platform was not always reliable in the past but with the last releases in the 3.11 branch it was quiet ok. Since the automatic migration (my fault, I had automatic updates enabled) the platform is not reliable any more - there is nothing special on my platform, no pakon / surricata or other software which asks for intensive hardware resources.

Maybe I have to move on - 6 years is not that bad from a lifecylce point of view, but it is sad because in core I still like the platform (shell etc.). I think I will give MikroTik (RB4011iGS) a try.

Does it make sense to open up a support ticket at turris.cz?

Last 4 days.

The logs before the “Booting” log are just standard logs (by crond etc.).

# zcat all.log* | grep waechter | grep Booting
zcat: all.log: not in gzip format
Jan 29 22:51:37 <user.info> waechter waechter kernel: [    0.000000] Booting Linux on physical CPU 0x0
Jan 29 22:51:37 <user.info> waechter waechter kernel: [    0.001480] Booting CPU 1
Jan 28 06:03:56 <user.info> waechter waechter kernel: [    0.000000] Booting Linux on physical CPU 0x0
Jan 28 06:03:56 <user.info> waechter waechter kernel: [    0.001477] Booting CPU 1
Jan 27 22:39:31 <user.info> waechter waechter kernel: [    0.000000] Booting Linux on physical CPU 0x0
Jan 27 22:39:31 <user.info> waechter waechter kernel: [    0.001480] Booting CPU 1
Jan 25 19:01:36 <user.info> waechter waechter kernel: [    0.000000] Booting Linux on physical CPU 0x0
Jan 25 19:01:36 <user.info> waechter waechter kernel: [    0.001498] Booting CPU 1

If someone wonders:

# cat /etc/turris-version
5.3.4

Did you reach our technical support department, so we can take a look more closely at what is happening and if it is caused by your configuration or not? Did you try it flash a new version, and configure it from scratch to rule out what I said?

Thanks Pepe. I will contact your support. I did not do a factory reset and build everything from scratch. Maybe this could be an option - but I really can not imagine that a configuration issue would cause that behavior. It seems that more users than myself have a similiar issue - is it probable that we all have config issues. But maybe I have to try it - unfortunately it will take some time.

I opened up a support ticket. Support was good so far but we did not really find the source of the behavior. But with 5.3.5 the turris is stable again. Up for 8 days now, no random reboots. I hope that this is the situation for all others here in the thread. Just wanted to let you know.

I got my last unexpectede reboot this morning, after 10 days…

These sound very suspicious, do you guys have a surge protection before the AC adapter on your omnias?

Might be a power surge/fluctuation/momentary loss causing these reboots…

no. My problem usually happens after daily disconnect from my ISP and is described in: