Boot loop after reboot issue workaround

I found my MOX in reboot loop this morning (A+B+E+C).

This is the error from serial port:
=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2022.01.27 06:42:19 =~=~=~=~=~=~=~=~=~=~=~=

U-Boot 2018.11 (Dec 16 2018 - 12:50:19 +0000), Build: jenkins-turris-os-packages-kittens-mox-90

DRAM: 1 GiB
Enabling Armada 3720 wComphy-0: SGMII1 3.125 Gbps
Comphy-1: PEX0 5 Gbps
Comphy-2: USB3_HOST0 5 Gbps
MMC: sdhci@d8000: 0
Loading Environment from SPI Flash… SF: Detected w25q64dw with page size 256 Bytes, erase size 4 KiB, total 8 MiB
OK
Model: CZ.NIC Turris Mox Board
Net: eth0: neta@30000
Turris Mox:
Board version: 22
RAM size: 1024 MiB
Serial Number: xyz
ECDSA Public Key: xyz
SD/eMMC version: SD
Module Topology:
1: Mini-PCIe Module
2: Peridot Switch Module (8-port)
3: Topaz Switch Module (4-port)

Hit any key to stop autoboot: 2 1 0
gpio: pin GPIO221 (gpio 57) value is 0
gpio: pin GPIO220 (gpio 56) value is 1
SF: Detected w25q64dw with page size 256 Bytes, erase size 4 KiB, total 8 MiB
device 0 offset 0x7f0000, size 0x10000
SF: 65536 bytes @ 0x7f0000 Read: OK
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1…
Found U-Boot script /boot.scr
2070 bytes read in 73 ms (27.3 KiB/s)

Executing script at 04d00000

19554 bytes read in 68 ms (280.3 KiB/s)
10280960 bytes read in 512 ms (19.1 MiB/s)

Flattened Device Tree blob at 04f00000

Booting using the fdt blob at 0x4f00000
Loading Device Tree to 000000003bf14000, end 000000003bf1bc61 … OK

Starting kernel …

After some minutes, it just reboots.
Can’t access the device.
How can i backup my etc config from SDCARD? I’m afraid to setup everything after flashing something.

I read the [reboot-issue-workaround-v4 ] and downloaded both files. But, since the device is unaccessible, how can i execute the mentioned commands

mtd write trusted-secure-firmware.bin secure-firmware
mtd write a53-firmware.bin a53-firmware || mtd write a53-firmware.bin u-boot ?

I just flashed another SDCard, booted and executed the workaround mtd commands.

With the original SDCard (with the failed update) i now get this error:
U-Boot 2021.10-rc3-00050-g7d3fea2c7f (Sep 07 2021 - 18:16:56 +0200)

DRAM: 1 GiB
WDT: Started with servicing (60s timeout)
Comphy chip #0:
Comphy-0: SGMII1 3.125 Gbps
Comphy-1: PEX0 5 Gbps
Comphy-2: USB3_HOST0 5 Gbps
PCIE-0: Link up
MMC: sdhci@d8000: 0
Loading Environment from SPIFlash… SF: Detected w25q64dw with page size 256 Bytes, erase size 4 KiB, total 8 MiB
OK
Model: CZ.NIC Turris Mox Board
Net: eth0: neta@30000
Turris Mox:
Board version: 22
RAM size: 1024 MiB
Serial Number: xyz
ECDSA Public Key: xyz
SD/eMMC version: SD
Module Topology:
1: Mini-PCIe Module
2: Peridot Switch Module (8-port)
3: Topaz Switch Module (4-port)

Hit any key to stop autoboot: 2 1 0
gpio: pin GPIO221 (gpio 57) value is 0
gpio: pin GPIO220 (gpio 56) value is 1
SF: Detected w25q64dw with page size 256 Bytes, erase size 4 KiB, total 8 MiB
device 0 offset 0x7f0000, size 0x10000
SF: 65536 bytes @ 0x7f0000 Read: OK
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1…
Found U-Boot script /boot.scr
2070 bytes read in 90 ms (22.5 KiB/s)

Executing script at 04d00000

19554 bytes read in 102 ms (186.5 KiB/s)
10448904 bytes read in 569 ms (17.5 MiB/s)
Moving Image from 0x5000000 to 0x5080000, end=5acb000

Flattened Device Tree blob at 04f00000

Booting using the fdt blob at 0x4f00000
Loading Device Tree to 000000003faf5000, end 000000003fafcc61 … OK

Starting kernel …

“Synchronous Abort” handler, esr 0x02000000
elr: ffffffffc5a9d000 lr : 0000000000003234 (reloc)
elr: 00000000059b0000 lr : 000000003ff16234
x0 : 000000003faf5000 x1 : 0000000000000000
x2 : 0000000000000000 x3 : 0000000000000000
x4 : 0000000005080000 x5 : 0000000000000001
x6 : 0000000000000008 x7 : 0000000000000000
x8 : 0000000005080000 x9 : 0000000000000002
x10: 000000000a200023 x11: 0000000000000002
x12: 0000000000000002 x13: 000000003faf8fff
x14: 000000003faf5000 x15: 000000003ff14f84
x16: 000000003ff5a230 x17: 0000000000000000
x18: 000000003fb02dc0 x19: 000000003ffe3f80
x20: 0000000000000000 x21: 0000000000000400
x22: 0000000000000003 x23: 000000003fbfacf8
x24: 000000003fbfacf8 x25: 000000003ffcdae8
x26: 0000000000000000 x27: 000000003ff1625c
x28: 000000003fbfad20 x29: 000000003fafe480

Code: 00000014 00000000 00000000 00000000 (088f16c8)
Resetting CPU …

resetting …

And again boot loop.
The second SDCard with the fresh image boots and i have to do the first setup steps.

Any help getting the device back to live?

I tried copy @16 to the second SDCard which let the system boot.
But i can’t use schnapps rollback 16:
root@mox:/# schnapps list -j
{ “snapshots”: [
{ “number”: 16, “type”: “pre”, “size”: “313.04MiB”, “created”: “2022-01-25 17:10:32 +0100”, “description”: “Automatic pre-update snapshot (TurrisOS 5.3.3)” }
] }

root@mox:/# schnapps rollback 16
ERROR: Not a Btrfs subvolume: Invalid argument
Rolling back failed!
root@mox:/# [ 60.633457] BTRFS info (device mmcblk1p1): qgroup scan completed (inconsistency flag cleared)

I also tried to copy @16/etc and @/root from the original (that fails to boot after update) SDCard to the second one /@ directory. The MOX boots, i can ssh into it but some software is missing and webserver not active. So i can’t use luci gui.

My workaround was to reflash a new SDCard with a fresh installation, install
vpn-policy-routing
luci-app-vpn-policy-routing
wireguard
wireguard-tools
luci-app-wireguard
luci-i18n-wireguard-de
relayd
luci-proto-relayd
tmux,
copy /etc/config files (without foris file) from the old SDCard/@16/etc/config to the fresh installation.

Sadly, since i no longer can trust auto updates from turris, i disabled this feature.

1 Like

Excellent self-assistance! The published instructions could help also the others.

Would be great if someone can add a step by step solution to repair the system with non-bootable SDCard as mentioned in my first post.

The steps should start at this point from the bootlog/boot process:
“Hit any key to stop autoboot: 2 1 0”

If i have UART console access, what are the additional steps in this situation to get the system to work?

Please see the documentation - rescue modes.

Thanks for the reply. The documentation, in special the rescue modes was my first hope. But it is not helpful. I don’t know who uses this procedure. But i don’t do anything an a device i cannot fully control. And pressing a not exposed button with a paper clip while plugin the power jack, count seconds and hope getting the correct mode is one of the things i cannot fully control. I don’t recommend it to anyone.

The important part in this documentation, the rescue shell is missing. There is only a hint about it “This option is for true geeks”. As mentioned above, a step by step guide how to repair / recover system from rescue shell would be great. Not just flashing a new image. The challenge is fixing the existing system on SDCard.

1 Like