Swich ports offline if plugged during boot

You’re right, I’ve looked only at swconfig driver and as you said - it’s bare minimum. My idea was to hack swconfig driver (as it already contains functions for indirect register addressing and propper locking) by exporting custom netlink attributes for raw register access - it seems this part can be done quite easily and even doesn’t require to touch swconfig userspace tool.

Thanks for DSA tip it indeed looks promising! Neverthless I think I will be able to obtain marvell datasheet by somewhat official way next week.

There is no need to hack the driver. MDIO access is possible on eth1 using address 0x10. Maybe we should switch to dsa like other devices.

This is the what have:

port(3): status = 500f
        -pause_en +my_pause -tx_paused -flow_ctrl -hd_flow +phy_detect
        -link duplex=half speed=10 mode=15
        -bit6(eee)
port(4): status = dd0f
        +pause_en +my_pause -tx_paused -flow_ctrl -hd_flow +phy_detect
        +link duplex=full speed=100 mode=15
        -bit6(eee)
port(5): status = 4e07
        -pause_en +my_pause -tx_paused -flow_ctrl -hd_flow -phy_detect
        +link duplex=full speed=1000 mode=7
        -bit6(eee)
port(6): status = 4e07
        -pause_en +my_pause -tx_paused -flow_ctrl -hd_flow -phy_detect
        +link duplex=full speed=1000 mode=7
        -bit6(eee)

Still in development: port mode (tagging etc) and phy access.

Notes: using dsa will create a problem: the driver is not prepared to have multiple cpu ports.

My wifi works stable. It was caused by the DHCP server not answering DHCP renews.

Status update: have phy access working and may even have found a solution.

Working on a phy reset with autoneg restart and compiling it on OpenWRT.

2 Likes

adminX: That’s a great news, keep us updated!

Binary: http://colorfulpyro.de/fix-switch
Source: http://colorfulpyro.de/fix-switch.c

It simply unconditionally resets the 5 lan ports, sets them to 10/100/1000 half/full and no pause and then starts auto-negotiation. It is a bit unsafe as there are no locks or something like this in place. Do not call swconfig or ethtool while this runs.

It should be called after swconfig has setup the vlans because swconfig will reset the switch.

1 Like

Interesting feature: my switch got broken. Now i have a fully working example how it should not be.

Do you mean it’s permanently broken? Was it caused by posted code?

Nope. Will probably only need a reset

Everything Okay. Switch works but my Raspberry PI2 had some problems with the µSD connection and did not boot.

Interesting thing is 12 source and 1 binary downloads but zero notification if it changed something.

@adminX I will try it today, I didn’t have time yesterday. I will let you know.

@adminX I’ve tried your code and it helped. I’m now also able to read (and probably write) switch EEPROM via MDIO and it seems it’s empty. Maybe some initialization is done in uboot. Do you have some suggestions what registers to what values should be initialized? I’ve looked to your code and beside accessing undocumented registers and bits (phy reg16_3 and phy reg16_2 bit 5) it’s more or less just phy reset.

PHY dump when link OK:
PHY2 00_0=0x1140
PHY2 01_0=0x796d
PHY2 02_0=0x0141
PHY2 03_0=0x0eb1
PHY2 04_0=0x05e1
PHY2 05_0=0xc5e1
PHY2 06_0=0x000f
PHY2 07_0=0x2001
PHY2 08_0=0x6801
PHY2 09_0=0x0e00
PHY2 10_0=0x7c00
PHY2 13_0=0x0003
PHY2 14_0=0x0000
PHY2 15_0=0x3000
PHY2 16_0=0x3360
PHY2 17_0=0xbf08
PHY2 18_0=0x0000
PHY2 19_0=0x1c50
PHY2 20_0=0x0020
PHY2 21_0=0x0000
PHY2 23_0=0x0000
PHY2 26_0=0x8040
PHY2 00_2=0x0000
PHY2 16_2=0xe308
PHY2 18_2=0x0000
PHY2 19_2=0x0000
PHY2 21_2=0x1046

PHY dump when broken (no link):
PHY2 00_0=0x1140
PHY2 01_0=0x7949
PHY2 02_0=0x0141
PHY2 03_0=0x0eb1
PHY2 04_0=0x05e1
PHY2 05_0=0x0000
PHY2 06_0=0x0004
PHY2 07_0=0x2001
PHY2 08_0=0x0000
PHY2 09_0=0x0e00
PHY2 10_0=0x4000
PHY2 13_0=0x0003
PHY2 14_0=0x0000
PHY2 15_0=0x3000
PHY2 16_0=0x3360
PHY2 17_0=0x8050
PHY2 18_0=0x0000
PHY2 19_0=0x0040
PHY2 20_0=0x0020
PHY2 21_0=0x0000
PHY2 23_0=0x0000
PHY2 26_0=0x8040
PHY2 00_2=0x0000
PHY2 16_2=0xe308
PHY2 18_2=0x0000
PHY2 19_2=0x0000
PHY2 21_2=0x1046

It may be somehow related to energy detect feature. 88e6176 has following energy detect modes available:

  • disabled (port still outputs link pulses / idle symbols)
  • Sense only (just listening, no tx)
  • Sense and transmit single NLP each second
    PHYs are configured to the last one mode.

Your non-working dump says no link and no annoucement from the other side. Energy detect mode Sense and transmit NLP each second should wake every switch on the other side. So this seems all correct.
My code does about the same things the other (unused in *WRT) kernel driver does. There seems to be some errata about how to reset the PHYs.

The main cause could be a too short reset pulse. Gets detected by the switch but not by the PHYs. As only 2 bits in the PHY get changed in the mvsw61xx source this may leave the PHYs in an undefined state.

phy page 0 register 16 gets bit 4 cleared (energy detect mode?)
phy page 0 register 0 gets bit 11 cleared (powerdown get set to disabled)

mvsw_get_reg(16,3) will give you the switch hardware id. If it is different from 1761 it could mean we have different revisions and they use different internal reset procecdures.

In the end i am glad my crystal ball worked this time.

Do you mean hw reset pulse generated by board on reset pin or internal reset triggered by bit 15 of switch global control register? Because the only thing which is reset by this bit is MAC state machine, it’s not propagated to PHYs.

phy page 0 register 16 gets bit 4 is marked as reserved, energy detect mode is configured in bits 9:8.

We have the same revision (1761).

I mean the one generate by the SoC. It is automatically pulled down if the SoC reset is pulled down but can also be in parts influenced by some settings.

Hi guys, could it be the problem is related only to Broadcom Tigon tg3 NIC? I have two wired computers, same software (gentoo), getting reserved IP from Turris via DHCP. When I restart the Turris, there is no network on the computer with tg3, while there is no problem on the other computer with RealTek r8169.

I see this in dmesg:

[Dec13 08:52] r8169 0000:04:00.0 enp4s0: link down
[ +11.479235] r8169 0000:04:00.0 enp4s0: link down
[ +3.351464] r8169 0000:04:00.0 enp4s0: link up

While on the other computer there is just:
[Dec13 08:52] tg3 0000:07:00.0 enp7s0: Link is down

and I get this only when I restart network manually (/etc/init.d/net.enp7s0 restart)
tg3 0000:07:00.0 enp7s0: Link is up at 1000 Mbps, full duplex
tg3 0000:07:00.0 enp7s0: Flow control is on for TX and on for RX

Seems, that Apple Gigabit Ethernet to Thunderbolt (jan357cz) uses Broadcom BCM57762, which is also tg3 related…

There was no problem with the tg3 NIC and other routers in the past.

So far I’ve tried with:
switch TPLINK TL-SG1016 - happens virtualy every time.
integrated NIC in my PC Marvell sky2 - happened about 3 times
comtrend vr-3026e v2, unknown 100Mb/s chipset - never happened

I’d like to test more combinations, but I have limited possibilities because omnia is located in hardly accessible dirty place and my room-mates (and me too) don’t like being without internet for too long. Maybe I will purchase another omnia just for testing and playing hehe.

it could be Broadcom based, I found SG2216 is

I have not had any problem with offline ports or link down.

My setup have been:

Internet
|
|
Omnia
|
|
Switch
| | | |____ mac server
| | | _____ mac client 1
| |_______ mac client 2
|_________client…

But after i discovered duplicates when pinging the server I though that somehting might be up with the unmanaged Netgear gigabit switches and perhaps some olf vlans. I did reset it but no difference.
bytes from 192.168.21.199: icmp_seq=0 ttl=64 time=1.492 ms
64 bytes from 192.168.0.239: icmp_seq=0 ttl=255 time=8.215 ms (DUP!)

I then moved the server from the switch directly to the TO and duplicates disappeared.
I then pinged a client and got duplicates… moved it to TO and no more duplicates.

Internet
|
|
Omnia
| | | |____ mac server
| | | _____ mac client 1
| |_______ mac client 2
|
|
Switch
| | | |____
| | | _____
| |_______
|_________client…

But suddenly I was not able to connect to the Foris nor the luci. “Server aborted connection” all the time.

As I was on ssh i did a reboot…

And now all of the lan ports on the TO was offline!
Did a physical disconnect of the ethernetcable, waited 4 sec and reconnected it and got a link up.
Had to do the same with all clients and switch. Have never needed to do this before.

Did a reboot and got link directly for the switch - something i did not when I had switch, server and client connected.

But the duplicates are back. Although now the duuplicate is from another IP range then before. This time it’s from TOs own range.

bytes from 192.168.21.199: icmp_seq=0 ttl=64 time=1.799 ms
64 bytes from 192.168.21.79: icmp_seq=0 ttl=255 time=8.121 ms (DUP!)

The server does not run DHCP. Only TO do that.
The server and one of the clients are on static leases set up in TO.

I then connected my MacBook Air via thunderbolt-ethernet to the switch on which the server and clients are connected to and then pinged my macbook air from the server.
I then got duplicates again.

PING 192.168.21.60 (192.168.21.60): 56 data bytes
64 bytes from 192.168.21.60: icmp_seq=0 ttl=64 time=0.553 ms
64 bytes from 192.168.21.79: icmp_seq=0 ttl=255 time=4.794 ms (DUP!)

When I look at the IPs I found a ghost MAC address that I have no idea where it comes from.

2016-12-13T14:01:15+01:00 info dnsmasq-dhcp[25833]: DHCPREQUEST(br-lan) 192.168.21.79 c0:3f:0e:3c:29:f5
2016-12-13T14:01:15+01:00 info dnsmasq-dhcp[25833]: DHCPACK(br-lan) 192.168.21.79 c0:3f:0e:3c:29:f5

My original MAC address looks like this:
2016-12-13T14:09:57+01:00 info dnsmasq-dhcp[25833]: DHCPDISCOVER(br-lan) xx:xx:xx:11:ea:8e
2016-12-13T14:09:57+01:00 info dnsmasq-dhcp[25833]: DHCPOFFER(br-lan) 192.168.21.60 xx:xx:xx:11:ea:8e
2016-12-13T14:09:58+01:00 info dnsmasq-dhcp[25833]: DHCPREQUEST(br-lan) 192.168.21.60 xx:xx:xx:11:ea:8e
2016-12-13T14:09:58+01:00 info dnsmasq-dhcp[25833]: DHCPACK(br-lan) 192.168.21.60 xx:xx:xx:11:ea:8e Air-18-i5

These ghost addresses are listen in the TOs ARP record.

Any ideas?

Have I messed up or are all this related somehow?

what kind of switch do you have? if you completely remove the switch, do you still get the DUP pings?

another thing is the link-down after TO reset, this is what some people in this thread see (and some don’t)

the Apple ethernet-thunderbolt has Broadcom inside (is it mac client1?)

what NIC do you have in the other mac server and mac client 2? they were all offline after TO reboot, right?