[5.2.4] [PATCH] WiFi fails after while "wmi mgmt tx queue is full"

Turris MOX, SW 5.2.0

Jun  5 00:21:03 turris kernel: [593387.355757] ath10k_pci 0000:00:00.0: wmi mgmt tx queue is full
Jun  5 00:21:03 turris kernel: [593387.362132] ath10k_pci 0000:00:00.0: failed to transmit packet, dropping: -28
Jun  5 00:21:03 turris kernel: [593387.369771] ath10k_pci 0000:00:00.0: failed to submit frame: -28
Jun  5 00:21:03 turris kernel: [593387.376050] ath10k_pci 0000:00:00.0: failed to transmit frame: -28
Jun  5 00:21:03 turris kernel: [593387.383442] ath10k_pci 0000:00:00.0: wmi mgmt tx queue is full
Jun  5 00:21:03 turris kernel: [593387.389562] ath10k_pci 0000:00:00.0: failed to transmit packet, dropping: -28
Jun  5 00:21:03 turris kernel: [593387.397002] ath10k_pci 0000:00:00.0: failed to submit frame: -28
Jun  5 00:21:03 turris kernel: [593387.403046] ath10k_pci 0000:00:00.0: failed to transmit frame: -28
Jun  5 00:21:03 turris kernel: [593387.410192] ath10k_pci 0000:00:00.0: wmi mgmt tx queue is full

tried reboot, it didn’t booted anymore. Physical OFF/ON worked.

this might be the case where alternative core drivers from Candela Technologies would help.
see reForis / Package Management / Packages / Alternative core drivers

occured again (and probably week back too, but I had no time to investigate - quick reboot the MOX).

it seems that WiFi fails after some time it’s running.

OS: 5.2.4
Kernel: 4.14.236

Ref to old thread: Unstable WiFi on MOX B

Could you please apply this patch: [v2] ath10k: fix wmi mgmt tx queue full due to race condition - Patchwork (2020/12/22)
which seems to fix the issue? Also it’s already merged in mainline/stable kernels kernel/git/stable/linux.git - Linux kernel stable tree

This is already for some time in our distribution. You can check it in the latest OpenWrt 19.07 branch.
There is used mac80211 with the version 4.19.193 (updated 2 months ago) for wireless drivers and it is present there.

EDIT: ok, I see it, it’s applied.
EDIT2: I see that the “fix” was added in 5.2.2, while I reported this issue 5.2.0 and it seems to be persistent trough 5.2.4, so my first guess it’s issue somewhere else