Hi Community,
TL;DR This module was not working right so I tried to improve it. Here is a patch:
cat 788-fix-quirks-for-HALNy-SFP-module.patch
From: "AreYouLoco?" <areyouloco@localhost.tld>
Date: Wed, 27 Dec 2023 17:10:12 +0100
Subject: [PATCH 1/1] net: sfp: add more quirks for HALNy GPON SFP
It seems that RX_LOS signal is indeed inverted. But TX_FAULT purpose
is simply unknown. Add more quirks to fully support that broken module.
And possibly fix 2,5Gbps as the module is capable of it.
Signed-off-by: AreYouLoco? <areyouloco@localhost.tld>
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -335,11 +335,11 @@
static void sfp_fixup_halny_gsfp(struct sfp *sfp)
{
- /* Ignore the TX_FAULT and LOS signals on this module.
- * these are possibly used for other purposes on this
- * module, e.g. a serial port.
- */
- sfp->state_hw_mask &= ~(SFP_F_TX_FAULT | SFP_F_LOS);
+ /* Ignore the TX_FAULT, invert LOS on this module.
+ * and fix long startup */
+ sfp_fixup_long_startup(sfp);
+ sfp_fixup_ignore_tx_fault(sfp);
+ sfp->state_hw_mask &= ~SFP_F_LOS;
}
static void sfp_quirk_2500basex(const struct sfp_eeprom_id *id,
@@ -379,6 +379,7 @@
}, {
.vendor = "HALNy",
.part = "HL-GSFP",
+ .modes = sfp_quirk_2500basex,
.fixup = sfp_fixup_halny_gsfp,
}, {
// Huawei MA5671A can operate at 2500base-X, but report 1.2GBd
Longer story: From the begining of use of that module I experienced disconnections from time to time. I blamed netifd for it but in fact netifd bringing the interface down and back up was just after effect of module sending TX_FAULT on pin 2. And time to time the link was not up but I didn’t have a time to look into it until few days ago so I just rebooted the router and it helped everytime.
root@router:~# dmesg -T | grep sfp
[Sat Dec 23 19:13:24 2023] sfp sfp: Host maximum
power 3.0W
[Sat Dec 23 19:13:24 2023] sfp sfp: module HALNy
HL-GSFP rev V1.0 sn HALN1010493c dc
20150525
[Sun Dec 24 02:17:35 2023] sfp sfp: module transm
it fault indicated
[Sun Dec 24 02:17:37 2023] sfp sfp: module transm
it fault recovered
[Sun Dec 24 17:46:55 2023] sfp sfp: module transm
it fault indicated
[Sun Dec 24 17:46:56 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 01:04:57 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 01:04:58 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 01:13:53 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 01:14:09 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 01:14:10 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 03:40:52 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 03:40:53 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 06:22:02 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 06:22:03 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 06:22:05 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 08:40:51 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 08:40:52 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 12:23:41 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 12:23:42 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 14:10:49 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 14:10:51 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 16:20:48 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 17:38:10 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 17:38:12 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 21:17:47 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 21:17:49 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 22:17:10 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 22:17:11 2023] sfp sfp: module transm
it fault recovered
[Tue Dec 26 00:18:47 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 01:00:15 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 01:00:16 2023] sfp sfp: module transm
it fault recovered
[Tue Dec 26 01:32:47 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 01:32:48 2023] sfp sfp: module transm
it fault recovered
[Tue Dec 26 07:41:24 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 07:41:26 2023] sfp sfp: module transm
it fault recovered
[Tue Dec 26 10:25:04 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 11:01:09 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 11:01:11 2023] sfp sfp: module transm
it fault recovered
[Tue Dec 26 11:04:44 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 11:04:45 2023] sfp sfp: module transm
it fault recovered
[Tue Dec 26 12:42:48 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 12:42:50 2023] sfp sfp: module transm
it fault recovered
[Wed Dec 27 00:06:19 2023] sfp sfp: module transm
it fault indicated
[Wed Dec 27 02:07:07 2023] sfp sfp: module transm
it fault indicated
[Wed Dec 27 02:07:08 2023] sfp sfp: module transm
it fault recovered
[Wed Dec 27 09:24:38 2023] sfp sfp: module transm
it fault indicated
[Wed Dec 27 09:24:39 2023] sfp sfp: module transm
it fault recovered
So now its looking like this in the log with my script running in the background. And I dont have to reboot anymore. And I get more or less constant connectivity. Bu we can do better than that so here how this patch came to life.
Continuing. I was rebooting the router quite often and I didn’t experience it that much before but now its more or less configured and is running constantly without counting power outages. So during the holidays I looked at logs and clearly after 5 tx_faults indicated the router is sending tx_disable to the module and thats when I lost link before. So I wrote a script that is monitoring /sys/kernel/debug/sfp/state
and when there is 1 left at the count down it puts the interface down, the link down and then the other way around. And like that I dont have to reboot anymore. Because the counter resets back to 5.
What @backon figured out here when we add 45 seconds delay to the U-Boot then we also dont have to soft reboot after power outage and wait for the SFP to boot correctly but there is code already for it in sfp.c
so I just used that. What @rmk wrote for BAD_GPON
Anyway to not to prolonge @rmk could you take a look at the patch based on your work and give a sign if its correct syntax in C. Also @mbehun if @rmk gives green light could you push it in the Turris Team and the easiest way would be to make an experimental branch lets call it crashlab-sfp
based on hbs
but with the above patch. So I can test it. I tried to build my own medkit to test that in advance but I simply failed and the Turris docs are not very friendly.
Also I don’t exclude hardware failure from reasoning of that SFP module and maybe indeed the laser is broken. But I dont think so as it was like this from the begining. But to rule that out I contacted my ISP and after New Year’s they gonna deliver me a brand new unit. If it stays the same as with the old one then definitely this module has broken tx_fault
implementation. And its not inverted its simply something else that we dont know yet. And above patch should fix as much as possible. But until then @mbehun if you please already make a test branch so it builds slowly in the mean time.
Cheers!
AreYouLoco?