Module E (8-port switch) often fails to initialize on boot if connected to module F (USB)

Hello,

I have Turris Mox Board v22 w/512 MiB RAM and modules AGFED with WiFi card in G and an empty SFP cage in D. And there is an issue with (re)booting warmed-up device with modules E and F which results into failed module E.

On power-on with a cold device (turned off for a while, usually after assembly, configuring, etc., or just after several first hours), it boots correctly and all modules are available and working (including the module E). However, after a few hours or days running and after cold- (power off and on) or warm- (software) reboot the boot-loader fails to initialize module E (the switch): eth-port LEDs just quickly blink together (not sequentially in a circle as usual), there is boot-loader message “Check of switch MDIO address failed for 0x10”, and kernel message “mv88e6085: probe of d0032004.mdio-mii:10 failed with error -110” (see below); and there are no lan* eth-ports. After a while (turned off for hours, unpredictable), the device boots correctly.

If configured with missing module F (USB3), i.e., with module E connected to module G, everything runs smoothly (for many months). The issue was discovered just after buying module F (so it never worked without the issue), and also all other modules and the device were new. The issue persist from early versions of Turris SW to the current stable version; and also updating a firmware has not helped.

  • Is there somebody who is running successfully the same HW configuration (AGFED)?
  • Can it be caused by a defect in my module F (or E), or in the MOX bus?
  • Is the MOX bus too long in my configuration (AGED works, AGFED not)?
  • Would Turris support help? RMA?

Any help would be appreciated. In the worst case, I can send the whole device to RMA, or just sell module F, however, it should work as it is.

Booting:

TIM-2.0
CZ.NIC's Armada 3720 Secure Firmware v2022.06.11 (Jul 11 2022 13:37:39)
Running on Turris MOX
Initializing DDR3 (512 MiB)... done
NOTICE:  Booting Trusted Firmware
NOTICE:  BL1: v2.7(release):fce4b6f4d8
NOTICE:  BL1: Built : 12:25:08, Aug 15 2022
NOTICE:  BL1: Booting BL2
NOTICE:  BL2: v2.7(release):fce4b6f4d8
NOTICE:  BL2: Built : 12:25:08, Aug 15 2022
NOTICE:  BL1: Booting BL31
NOTICE:  BL31: v2.7(release):fce4b6f4d8
NOTICE:  BL31: Built : 12:25:08, Aug 15 2022


U-Boot 2022.07 (Aug 15 2022 - 12:25:08 +0000), Build: jenkins-turris-os-packages-lions-mox-1925

DRAM:  512 MiB
Core:  54 devices, 27 uclasses, devicetree: separate
WDT:   Started watchdog@8300 with servicing (60s timeout)
Comphy chip #0:
Comphy-0: SGMII1        1.25 Gbps
Comphy-1: PEX0          5 Gbps
Comphy-2: USB3_HOST0    5 Gbps
PCIe: Link up
MMC:   sdhci@d0000: 1, sdhci@d8000: 0
Loading Environment from SPIFlash... SF: Detected w25q64dw with page size 256 Bytes, erase size 4 KiB, total 8 MiB
OK
Model: CZ.NIC Turris Mox Board
  Board version: 22
  RAM size: 512 MiB
  Serial Number: [redacted]
  ECDSA Public Key: [redacted]
  SD/eMMC version: SD
Module Topology:
   1: Passthrough Mini-PCIe Module
   2: USB 3.0 Module (4 ports)
   3: Peridot Switch Module (8-port)
   4: SFP Module
Net:   eth0: ethernet@30000, eth1: ethernet@40000
Check of switch MDIO address failed for 0x10
Hit any key to stop autoboot:  0
gpio: pin GPIO221 (gpio 57) value is 0
gpio: pin GPIO220 (gpio 56) value is 1
SF: Detected w25q64dw with page size 256 Bytes, erase size 4 KiB, total 8 MiB
device 0 offset 0x7f0000, size 0x10000
SF: 65536 bytes @ 0x7f0000 Read: OK
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1...
Found /boot/extlinux/extlinux.conf
Retrieving file: /boot/extlinux/extlinux.conf
[...]

Kernel:

[...]
srp 23 19:51:40 ap kernel: Turris Mox serial number [redacted]
srp 23 19:51:40 ap kernel:            board version 22
srp 23 19:51:40 ap kernel:            burned RAM size 512 MiB
srp 23 19:51:40 ap kernel: turris-mox-rwtm firmware:armada-3700-rwtm: HWRNG successfully registered
srp 23 19:51:40 ap kernel: crypto-safexcel d0090000.crypto: EIP97:211(0,1,4,4)-HIA:223(0,5,5),PE:132/331(alg:7fddf000)/0/0/0
srp 23 19:51:40 ap kernel: mv88e6085: probe of d0032004.mdio-mii:10 failed with error -110
srp 23 19:51:40 ap kernel: mt7915e 0000:03:00.0: enabling device (0000 -> 0002)
[...]

Just looking at Turris MOX u-boot sources and it seems that in my case all the modules are detected correctly by the first part of int last_stage_init(void) but after that, check_switch_address(bus, 0x10 + i) for the Peridot switch module E fails on sw_scratch_read(...).

So it can be a u-boot SW issue or a mox-bus HW issue.

1 Like

The device is back from RMA. The issue was a MDIO bus malfunction in module F (USB) so MDIO could not reach module E. The module F has been replaced and everything is working like a charm now.

4 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.