Swich ports offline if plugged during boot

Same here. When reboot my omnia by reboot command, reboot button or power cycling, sometimes just a few swicth ports doesn’t work, sometimes no switch port works and sometimes all ports are working. WAN port always works (and I’m using SFP). I’ve tried to connect 2 different devices with different cables with the same result. Sometimes it helps to unplug/plug cable but usually not and I have to plug cable to the different port. Switch config is default, no tagged VLANs. Once the link is established on the port, everything is working rock solid for days until next reboot.

This is crucial issue for me, because my omnia is placed at hardly accessible place at the attic and it’s very unpleasant to going there just for plugging and unplugging cables whenever I reboot my router. Because of this fact I’ve setup remote access to omnia serial console to be able to figure out what is happening. Any suggestions what I can try when this happens? I’d like to help to debug this issue.

EDIT: And now I did SW reboot and it happened again. I have two cables connected to the omnia switch. Port2 and Port4. Port4 has link and is working. Both SGMII CPU ports has link too. Port2 is without link (LED is turned off, swconfig reporting link down, connection is not working). There is nothing suspicious in neither dmesg or /var/log/messages. So far I’ve tried:
swconfig dev switch0 load network
swconfig dev switch0 set reset
/etc/init.d/network restart
Nothing above helped, so I’ve done another sw reboot and now it’s working again, weird.
2davidhaluska: Did you received some response from Turris team yet?

Hi Dandys,

I was asking if package was delivered without issues so I received response that package is there without damage and that they will look into that but no response yet for actual issue.

David

I contacted Turris team by e-mail (to info@turris.cz and tech.support@turris.cz) and with no response! :-/

I’m experiencing the same.

My guess is that issue is somewhere around link/MDI-X negotiation (turris can’t detect other side auto-neg?.. possibly cable? I’ve tested 3 but all straight… don’t have any cross at hand right now)

after reboot with: [root@turris:~]# reboot

on turris:

root@turris:~# ethtool eth0
Settings for eth0:
    Supported ports: [ TP MII ]
    Supported link modes:   1000baseT/Half 1000baseT/Full 
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Advertised link modes:  1000baseT/Half 1000baseT/Full 
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes <===================================
    Link partner advertised link modes:  1000baseT/Full 
    Link partner advertised pause frame use: No
    Link partner advertised auto-negotiation: No <=======================
    Speed: 1000Mb/s
    Duplex: Full
    Port: MII
    PHYAD: 0
    Transceiver: external
    Auto-negotiation: on <==============================================
    Link detected: yes <================================================

on laptop (tg3 driver):

root@pve:~# ethtool eth0
    Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Half 1000baseT/Full 
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Half 1000baseT/Full 
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes <=================================
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on <==================================================
        MDI-X: Unknown
        Supports Wake-on: g
        Wake-on: d
        Current message level: 0x00000020 (32)
                       ifup
        Link detected: no <=====================================================

when it works (after “ethtool -s eth0 autoneg on” or ifconfig eth0 down && ifconfig eth0 up on laptop):

on turris:

root@turris:~# ethtool eth0
Settings for eth0:
    Supported ports: [ TP MII ]
    Supported link modes:   1000baseT/Half 1000baseT/Full 
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Advertised link modes:  1000baseT/Half 1000baseT/Full 
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Link partner advertised link modes:  1000baseT/Full 
    Link partner advertised pause frame use: No
    Link partner advertised auto-negotiation: No <===============================
    Speed: 1000Mb/s
    Duplex: Full
    Port: MII
    PHYAD: 0
    Transceiver: external
    Auto-negotiation: on
    Link detected: yes

on laptop:

root@pve:~# ethtool eth0
Settings for eth0:
    Supported ports: [ TP ]
    Supported link modes:   10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Half 1000baseT/Full 
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Advertised link modes:  10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Half 1000baseT/Full 
    Advertised pause frame use: Symmetric
    Advertised auto-negotiation: Yes
    Link partner advertised link modes:  10baseT/Half 10baseT/Full 
                                         100baseT/Half 100baseT/Full 
                                         1000baseT/Full 
    Link partner advertised pause frame use: Symmetric
    Link partner advertised auto-negotiation: Yes
    Speed: 1000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 1
    Transceiver: internal
    Auto-negotiation: on
    MDI-X: off
    Supports Wake-on: g
    Wake-on: d
    Current message level: 0x00000020 (32)
                   ifup
    Link detected: yes <==============================

It doesn’t help when I run ethtool -r eth0 on laptop but it helps when I do “ethtool -s eth0 autoneg on” even if it is already on.

For now, as a workaround, I’m gonna put cron entry to run “ethtool -s eth0 autoneg on” every minute on my laptop.

Hi Peter, I agree it can be auto MDI/MDI-X or autonegotiation related issue. Unfortunately output of ethtool on Turris side won’t tell you much, because it’s related just to the SGMII link between SoC interface and switch chip interface. I think some useful informations could be obtained from switch chip registers accessible through MDIO, but 88E6176 full datasheet which contains it’s registers description is available only under NDA with Marvell :frowning: I’ve tried to get some insight by looking to 88e6176 linux driver, but it lacks defintions for PHY HW - related registers which I believe would be most helpful.

dandys: drivers/net/phy/marvell.c should have the phy register definitions. the dsa driver only manages the switch not the phy.

I am already working on this part although without documentation. The dsa drivers export nearly all data of this switch and its phys. OpenWRT’s swconfig driver is the bare minimum to get the VLANs configured. I am currently working to get accessing the PHYs working. Reading registers from the switch itself already works somewhat (i can read its ID and version). Accessing the phy registers is some crazy as it is double indirect access.

This is not my main priority at the moment as i currently work on some wifi and dhcp related problems with my setup.

1 Like

You’re right, I’ve looked only at swconfig driver and as you said - it’s bare minimum. My idea was to hack swconfig driver (as it already contains functions for indirect register addressing and propper locking) by exporting custom netlink attributes for raw register access - it seems this part can be done quite easily and even doesn’t require to touch swconfig userspace tool.

Thanks for DSA tip it indeed looks promising! Neverthless I think I will be able to obtain marvell datasheet by somewhat official way next week.

There is no need to hack the driver. MDIO access is possible on eth1 using address 0x10. Maybe we should switch to dsa like other devices.

This is the what have:

port(3): status = 500f
        -pause_en +my_pause -tx_paused -flow_ctrl -hd_flow +phy_detect
        -link duplex=half speed=10 mode=15
        -bit6(eee)
port(4): status = dd0f
        +pause_en +my_pause -tx_paused -flow_ctrl -hd_flow +phy_detect
        +link duplex=full speed=100 mode=15
        -bit6(eee)
port(5): status = 4e07
        -pause_en +my_pause -tx_paused -flow_ctrl -hd_flow -phy_detect
        +link duplex=full speed=1000 mode=7
        -bit6(eee)
port(6): status = 4e07
        -pause_en +my_pause -tx_paused -flow_ctrl -hd_flow -phy_detect
        +link duplex=full speed=1000 mode=7
        -bit6(eee)

Still in development: port mode (tagging etc) and phy access.

Notes: using dsa will create a problem: the driver is not prepared to have multiple cpu ports.

My wifi works stable. It was caused by the DHCP server not answering DHCP renews.

Status update: have phy access working and may even have found a solution.

Working on a phy reset with autoneg restart and compiling it on OpenWRT.

2 Likes

adminX: That’s a great news, keep us updated!

Binary: http://colorfulpyro.de/fix-switch
Source: http://colorfulpyro.de/fix-switch.c

It simply unconditionally resets the 5 lan ports, sets them to 10/100/1000 half/full and no pause and then starts auto-negotiation. It is a bit unsafe as there are no locks or something like this in place. Do not call swconfig or ethtool while this runs.

It should be called after swconfig has setup the vlans because swconfig will reset the switch.

1 Like

Interesting feature: my switch got broken. Now i have a fully working example how it should not be.

Do you mean it’s permanently broken? Was it caused by posted code?

Nope. Will probably only need a reset

Everything Okay. Switch works but my Raspberry PI2 had some problems with the µSD connection and did not boot.

Interesting thing is 12 source and 1 binary downloads but zero notification if it changed something.

@adminX I will try it today, I didn’t have time yesterday. I will let you know.

@adminX I’ve tried your code and it helped. I’m now also able to read (and probably write) switch EEPROM via MDIO and it seems it’s empty. Maybe some initialization is done in uboot. Do you have some suggestions what registers to what values should be initialized? I’ve looked to your code and beside accessing undocumented registers and bits (phy reg16_3 and phy reg16_2 bit 5) it’s more or less just phy reset.

PHY dump when link OK:
PHY2 00_0=0x1140
PHY2 01_0=0x796d
PHY2 02_0=0x0141
PHY2 03_0=0x0eb1
PHY2 04_0=0x05e1
PHY2 05_0=0xc5e1
PHY2 06_0=0x000f
PHY2 07_0=0x2001
PHY2 08_0=0x6801
PHY2 09_0=0x0e00
PHY2 10_0=0x7c00
PHY2 13_0=0x0003
PHY2 14_0=0x0000
PHY2 15_0=0x3000
PHY2 16_0=0x3360
PHY2 17_0=0xbf08
PHY2 18_0=0x0000
PHY2 19_0=0x1c50
PHY2 20_0=0x0020
PHY2 21_0=0x0000
PHY2 23_0=0x0000
PHY2 26_0=0x8040
PHY2 00_2=0x0000
PHY2 16_2=0xe308
PHY2 18_2=0x0000
PHY2 19_2=0x0000
PHY2 21_2=0x1046

PHY dump when broken (no link):
PHY2 00_0=0x1140
PHY2 01_0=0x7949
PHY2 02_0=0x0141
PHY2 03_0=0x0eb1
PHY2 04_0=0x05e1
PHY2 05_0=0x0000
PHY2 06_0=0x0004
PHY2 07_0=0x2001
PHY2 08_0=0x0000
PHY2 09_0=0x0e00
PHY2 10_0=0x4000
PHY2 13_0=0x0003
PHY2 14_0=0x0000
PHY2 15_0=0x3000
PHY2 16_0=0x3360
PHY2 17_0=0x8050
PHY2 18_0=0x0000
PHY2 19_0=0x0040
PHY2 20_0=0x0020
PHY2 21_0=0x0000
PHY2 23_0=0x0000
PHY2 26_0=0x8040
PHY2 00_2=0x0000
PHY2 16_2=0xe308
PHY2 18_2=0x0000
PHY2 19_2=0x0000
PHY2 21_2=0x1046

It may be somehow related to energy detect feature. 88e6176 has following energy detect modes available:

  • disabled (port still outputs link pulses / idle symbols)
  • Sense only (just listening, no tx)
  • Sense and transmit single NLP each second
    PHYs are configured to the last one mode.

Your non-working dump says no link and no annoucement from the other side. Energy detect mode Sense and transmit NLP each second should wake every switch on the other side. So this seems all correct.
My code does about the same things the other (unused in *WRT) kernel driver does. There seems to be some errata about how to reset the PHYs.

The main cause could be a too short reset pulse. Gets detected by the switch but not by the PHYs. As only 2 bits in the PHY get changed in the mvsw61xx source this may leave the PHYs in an undefined state.

phy page 0 register 16 gets bit 4 cleared (energy detect mode?)
phy page 0 register 0 gets bit 11 cleared (powerdown get set to disabled)

mvsw_get_reg(16,3) will give you the switch hardware id. If it is different from 1761 it could mean we have different revisions and they use different internal reset procecdures.

In the end i am glad my crystal ball worked this time.

Do you mean hw reset pulse generated by board on reset pin or internal reset triggered by bit 15 of switch global control register? Because the only thing which is reset by this bit is MAC state machine, it’s not propagated to PHYs.

phy page 0 register 16 gets bit 4 is marked as reserved, energy detect mode is configured in bits 9:8.

We have the same revision (1761).