Debugging repeated "link is down"

Symptom:

2019-08-12 11:02:23 notice netifd[]: Network device 'eth1' link is down
2019-08-12 11:02:23 notice netifd[]: Network alias 'eth1' link is down
2019-08-12 11:02:23 notice netifd[]: Interface 'wan6' has link connectivity loss
2019-08-12 11:02:23 notice netifd[]: Interface 'wan' has link connectivity loss
2019-08-12 11:02:23 info kernel[]: [83182.319922] mvneta f1034000.ethernet eth1: Link is Down
2019-08-12 11:02:48 notice netifd[]: Network device 'eth1' link is up
2019-08-12 11:02:48 notice netifd[]: Network alias 'eth1' link is up
2019-08-12 11:02:48 notice netifd[]: Interface 'wan6' has link connectivity 
2019-08-12 11:02:48 notice netifd[]: Interface 'wan' has link connectivity 
2019-08-12 11:02:48 info kernel[]: [83207.314906] mvneta f1034000.ethernet eth1: Link is Up - 100Mbps/Full - flow control off

I occasionally had these for quite a long time, but now it’s at a rate of about 300/day, which is very annoying (it’s long enough to break video streams, for example). WAN is connected to an ISP box which I can’t directly affect (it’s even shared with other customers).

Do you have any idea how to debug this – narrow where’s the problem? Backup plan: temporarily revert to a cheap router and monitor by a continuous ping from a device inside.

The significant worsening correlated time-wise with an update (3.11.6 rc4 -> rc5), but reverting to stable (incl. reboot) didn’t seem to affect that, so I assume it was unrelated. The ISP box is a probable suspect, but I want to build a bit more confidence/evidence before reaching out to their support :wink:

It could be interference/current/fault on the hardware, e.g. port on the upstream box or the TO box or with the (CAT5|6|7 ?) cable/plug connecting the two nodes.


Try TOS4.x beta and see whether it reproduces (netifd is know as buggy in TOS3.x), though it seems there are few issues with last release?

I actually have a spare CAT on the whole length, so I might start by swapping the two, though that could help only for a rather small set of causes.

After some HW fixes on the ISP box – based on complaints from other customers, apparently, everything has been fine for many hours. Thanks for the suggestions.

1 Like