Ethernet cable test with ethtool --cable-test and --cable-test-tdr

Hi, I don’t currently have an Omnia, but looking at the kernel sources I think the hardware should support Ethernet cable test and also raw test TDR (time domain reflectometry) data retrieval.

Both allow the detection of broken or damaged Ethernet cables and connectors etc. TDR is especially useful for marginal cables.

Would someone mind trying this? I’m currently debugging a different device with the same Ethernet PHY…

Cable test support was added in about 2019, so any recent kernel and ethtool should work:

ethtool --cable-test devname

and:

ethtool --cable-test-tdr devname

n.b. Running a test on a live link sometimes confuses the far end Ethernet adapter, so you might need to:

ethtool -r devname

afterwards to re-establish a link.

The ethtool-full package is needed with OpenWrt.

Thanks!

Tim.

1 Like

Turris omniaon turrisOS 7.1:

root@turris:~# opkg install ethtool-full
Installing ethtool-full (6.6-1) to root...
Downloading https://repo.turris.cz/hbs/omnia/packages/base/ethtool-full_6.6-1_arm_cortex-a9_vfpv3-d16.ipk
Configuring ethtool-full.

root@turris:~# ethtool --cable-test eth2
Cable test started for device eth2.
Cable test completed for device eth2.
Pair A code OK
Pair B code OK
Pair C code OK
Pair D code OK

root@turris:~# ethtool --cable-test-tdr eth2
Cable test TDR started for device eth2.
Cable test TDR completed for device eth2.
TDR Pulse 1000mV
Step configuration: 0.80-149.73 meters in 0.80m steps

I am not sure the tdr test should not have resulted in more output, but generally these do something…

1 Like

Thanks!

You should be able to try an unused port, and plug in a patch cable and leave the other end unplugged to get output like this (using a 10m cable):

root@ap15:~# ethtool --cable-test eth1
Cable test started for device eth1.
Cable test completed for device eth1.
Pair A code Open Circuit
Pair B code Open Circuit
Pair C code Open Circuit
Pair D code Open Circuit
Pair A, fault length: 10.40m
Pair B, fault length: 10.40m
Pair C, fault length: 10.40m
Pair D, fault length: 10.40m

This can be useful for diagnosing bad fixed cabling in the building. It can be useful to run several times to get an average or to spot marginal cabling.

You’re correct that your --cable-test-tdr is truncated output. Do you see a message in the kernel log messages Timeout while waiting for cable test to finish?

If you have time this patch should fix the missing output and would add some useful dmesg debug output for me to fine tune the timings before submitting to the linux-net mailing list for inclusion into the upstream mainline kernel.

Once we know that complete output is created with the patch applied, then the time to complete a test would also be useful with time ethtool --cable-test-tdr eth2 > /dev/null.

The patch is for OpenWrt snapshot, but should also apply to earlier kernels as well as mainline Linux.

Index: linux-6.6.61/drivers/net/phy/marvell.c
===================================================================
--- linux-6.6.61.orig/drivers/net/phy/marvell.c
+++ linux-6.6.61/drivers/net/phy/marvell.c
@@ -38,6 +38,9 @@
 #include <asm/irq.h>
 #include <linux/uaccess.h>
 
+/* For timing debugging only TODO remove */
+#include <linux/time.h>
+
 #define MII_MARVELL_PHY_PAGE           22
 #define MII_MARVELL_COPPER_PAGE                0x00
 #define MII_MARVELL_FIBER_PAGE         0x01
@@ -2042,14 +2045,24 @@ static int marvell_vct5_wait_complete(st
 {
        int i;
        int val;
+       u64 entrystamp, exitstamp;
+       entrystamp = ktime_get_ns();
+
+       usleep_range(5000, 10000);
+       /* msleep(10); */
 
-       for (i = 0; i < 32; i++) {
+       for (i = 0; i < 100; i++) {
                val = __phy_read(phydev, MII_VCT5_CTRL);
                if (val < 0)
                        return val;
 
-               if (val & MII_VCT5_CTRL_COMPLETE)
+               if (val & MII_VCT5_CTRL_COMPLETE) {
+                       exitstamp = ktime_get_ns();
+                       phydev_warn(phydev, "Got VCT5 ctrl data after polling %d times\n", i);
+                       phydev_warn(phydev, "Got VCT5 after %llu ns\n", exitstamp - entrystamp);
                        return 0;
+               }
+               usleep_range(800, 1200);
        }
 
        phydev_err(phydev, "Timeout while waiting for cable test to finish\n");

If you prefer I can also build a test kernel for you to try…

I did not test any of the switch ports, but eth2 which I use as WAN port, will look for one of the switch ports maybe tonight…

I do understand the promised utility…

I did not check, will do so later…

Let me first repeat my tests, with an open cable… and check the messages…

1 Like

If you have a cable which is not working at full speed (e.g. only at 100 Mbit instead of gigabit), or is not working at all, then this will tell you where the break is. In structured cabling ( device → patch cord → fixed cabling → patch cord → device) you can tell where the problem is (how many meters from the device where you run a test).

Thanks!

The TDR output (when visualised e.g. with gnuplot or a spreadsheet) allows you to see damaged but still working cables (e.g. cables which you might want to replace as preventative maintenance) - cable damage can result in reflected signals which can cause packet corruption. Badly terminated wall sockets can also cause similar problems (e.g. loose connections, corrosion, dirt, impedance mis-matches).

Overall it allows you to diagnose where in a cable run issues (if any) might be - to localise problems (or rule out cable issues entirely when the problem is caused by faulty devices or heavy electrical interference).

It’s not as good as a tool like a Fluke DSX-5000 but it’s certainly a lot cheaper :-).

Mmmh. I tried again:
--cable-test
works on eth2, but not really on the switch ports:

root@turris:~# ethtool --cable-test eth2
Cable test started for device eth2.
Cable test completed for device eth2.
Pair A code Open Circuit
Pair B code Open Circuit
Pair C code Open Circuit
Pair D code Open Circuit
Pair A, fault length: 16.80m
Pair B, fault length: 16.80m
Pair C, fault length: 16.80m
Pair D, fault length: 16.80m

This was with an open cable of nominally 15m attached (I have not confirmed the exact length so the output is at least in the right ball-park).

root@turris:~# ethtool --cable-test-tdr eth2
Cable test TDR started for device eth2.
Cable test TDR completed for device eth2.
TDR Pulse 1000mV
Step configuration: 0.80-149.73 meters in 0.80m steps

then hangs there forever with the expected output in dmesg:

[345633.207881] Marvell 88E1510 f1072004.mdio-mii:01: Timeout while waiting for cable test to finish

Switch ports:

oot@turris:~# ethtool --cable-test lan1
Cable test started for device lan1.
Cable test completed for device lan1.
Pair A code OK
Pair B code OK
Pair C code OK
Pair D code OK
root@turris:~# ethtool --cable-test lan2
Cable test started for device lan2.
Cable test completed for device lan2.
Pair A code OK
Pair B code OK
Pair C code OK
Pair D code OK
root@turris:~# ethtool --cable-test lan3
Cable test started for device lan3.
Cable test completed for device lan3.
Pair A code OK
Pair B code OK
Pair C code OK
Pair D code OK
root@turris:~# ethtool --cable-test lan4
Cable test started for device lan4.
Cable test completed for device lan4.
Pair A code OK
Pair B code OK
Pair C code OK
Pair D code OK

The problem is that Lan1 had an open cable of around 10m attached and failed to notice/report.

Same issues with tdr though:

root@turris:~# ethtool --cable-test-tdr lan1
Cable test TDR started for device lan1.
Cable test TDR completed for device lan1.
TDR Pulse 1000mV
Step configuration: 0.80-149.73 meters in 0.80m steps
Pair A Amplitude    0
Pair B Amplitude    0
Pair C Amplitude    0
Pair D Amplitude    0

with dmesg showing:

[345613.147601] Marvell 88E1540 mv88e6xxx-1:01: Timeout while waiting for cable test to finish

I guess we are not done yet…

1 Like

Hi, Thanks for the testing! Not sure what’s happening with the switch ports - maybe all of the tests are actually going to just one port or something?

Probably worth focusing on the non-switch port for the time being.

For the Timeout while waiting for cable test to finish - the patch I posted should fix that.

If anyone would like to test…

I’ve pushed the debug version of the patch to this branch for convenience:

This is based on OpenWrt main (i.e. the “snapshot” builds), so testers will need to build-in any packages needed, so at least CONFIG_PACKAGE_ethtool-full=y should be set in the .config so that the full ethtool package gets pre-installed in the files.

It should be possible to tftpboot the initramfs-kernel.bin file which gets built under bin/targets/ to test this with everything running from ram so that flash isn’t touched.

Well its better to ask for that on OpenWRT forum or rebase patch on some Turris code as we lag few OpenWRT versions behind

The impacted code hasn’t changed since it was first introduced into the kernel, so I think the same patch should apply to the Turis versions.

With the Turis version on the local fs, and with the branch you want to build selected…

git remote add timsmalldevtree https://github.com/tim-seoss/openwrt
git fetch timsmalldevtree
git cherry-pick 3d901cd69648e41cfc1b97e305311d0f7692870e
git log -p -n 4 #just to check...

If that doesn’t work, then please let me know…

Ok anyway I need to rebuild kernel so I might cherry-pick your patch we see if it applies clearly on 5.15.148. Will let you know later on

1 Like

I’ve created an OpenWrt github issue here: ethtool --cable-test-tdr fails with Timeout kernel message on Huawei AP5030DN and MikroTik RB5009UPr+S+IN · Issue #17077 · openwrt/openwrt · GitHub