Can't get 1Gbit on Mox Classic

moeller0 · November 1, 2019, 6:40pm

root@turris:~# cat /etc/os-release 
NAME="TurrisOS"
VERSION="4.0.1"
ID="turrisos"
ID_LIKE="lede openwrt"
PRETTY_NAME="TurrisOS 4.0.1"
VERSION_ID="4.0.1"
HOME_URL="https://www.turris.cz/"
BUG_URL="https://gitlab.labs.nic.cz/groups/turris/-/issues"
SUPPORT_URL="https://www.turris.cz/support/"
BUILD_ID="80076f9"
LEDE_BOARD="mvebu/cortexa9"
LEDE_ARCH="arm_cortex-a9_vfpv3"
LEDE_TAINTS="busybox"
LEDE_DEVICE_MANUFACTURER="CZ.NIC"
LEDE_DEVICE_MANUFACTURER_URL="https://www.turris.cz"
LEDE_DEVICE_PRODUCT="Turris Omnia"
LEDE_DEVICE_REVISION="v0"
LEDE_RELEASE="TurrisOS 4.0.1 80076f9"


root@turris:~# uname -a
Linux turris 4.14.148 #0 SMP Tue Oct 8 23:24:25 2019 armv7l GNU/Linux
root@turris:~# cat /etc/hotplug.d/net/20-smp-tune 
#!/bin/sh
[ "$ACTION" = add ] || exit

NPROCS="$(grep -c "^processor.*:" /proc/cpuinfo)"
[ "$NPROCS" -gt 1 ] || exit

PROC_MASK="$(( (1 << $NPROCS) - 1 ))"

find_irq_cpu() {
	local dev="$1"
	local match="$(grep -m 1 "$dev\$" /proc/interrupts)"
	local cpu=0

	[ -n "$match" ] && {
		set -- $match
		shift
		for cur in `seq 1 $NPROCS`; do
			[ "$1" -gt 0 ] && {
				cpu=$(($cur - 1))
				break
			}
			shift
		done
	}

	echo "$cpu"
}

set_hex_val() {
	local file="$1"
	local val="$2"
	val="$(printf %x "$val")"
	[ -n "$DEBUG" ] && echo "$file = $val"
	echo "$val" > "$file"
}

default_ps="$(uci get "network.@globals[0].default_ps")"
[ -n "$default_ps" -a "$default_ps" != 1 ] && exit 0

exec 512>/var/lock/smp_tune.lock
flock 512 || exit 1

for dev in /sys/class/net/*; do
	[ -d "$dev" ] || continue

	# ignore virtual interfaces
	[ -n "$(ls "${dev}/" | grep '^lower_')" ] && continue
	[ -d "${dev}/device" ] || continue

	device="$(readlink "${dev}/device")"
	device="$(basename "$device")"
	irq_cpu="$(find_irq_cpu "$device")"
	irq_cpu_mask="$((1 << $irq_cpu))"

	for q in ${dev}/queues/rx-*; do
		set_hex_val "$q/rps_cpus" "$(($PROC_MASK & ~$irq_cpu_mask))"
	done

	ntxq="$(ls -d ${dev}/queues/tx-* | wc -l)"

	idx=$(($irq_cpu + 1))
	for q in ${dev}/queues/tx-*; do
		set_hex_val "$q/xps_cpus" "$((1 << $idx))"
		let "idx = idx + 1"
		[ "$idx" -ge "$NPROCS" ] && idx=0
	done
done

So it looks like TOS4 inherited the new rps distribution code from OpenWrt (as it should).
But while the new method ostensibly makes sense, it starts to fail once the router comes under heavy sirq loads, like wifi and especially traffic shaping, as then eveything competes for a single CPU while tho other idles. Sure, processing packets coming in on CPU0 on CPU1 causes cross-CPU traffic and will not be ideal in regards to caching, but if the processing CPU is maxed out even these inefficiencies allow for greater performance. So at least for mvebu ipq8XXX devices the new defaults are less than ideal. I am not sure of the anything goes the “echo 3” essentially configures on dual core devices is the optimum, but it certainly helps once the router gets into heavy lifting. With internet access links getting faster and faster I believe the heavy lifting scenario will rather get more common…

lucenera · November 1, 2019, 6:45pm

Should I give those commands from SSH on Turris OS 3.11.8? What exactly do they do? Is there a way to make the changes permanent?

moeller0 · November 1, 2019, 7:58pm

I believe that /etc/hotplug.d/net/20-smp-tune might not be ideal for mvebu socs like the omnia, but the permant solution is to either change 20/smp-tune or add another script to tune on mvebu…

neheb · November 2, 2019, 2:50am

I’ll just leave these here:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/net/ethernet/marvell/mvneta.c?h=v4.19.81&id=c307e2a895c9ce4040e68f034008c289209ce482

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/net/ethernet/marvell/mvneta.c?h=v4.19.81&id=7e47fd84b56bb37ff1c3d9ab49c2fff5ee4b3077

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/net/ethernet/marvell/mvneta.c?h=v4.19.81&id=0f5c6c30a0f8c629b92ecdaef61b315c43fde10a

moeller0 · November 2, 2019, 11:05am

Thanks, these seem relevant, and as far as I can tell these will come in once TOS is based on either the elusive 19.7 or whatever comes after that. Unless someone backports these.

respi · October 3, 2020, 5:42pm

One year, a lot of testing and regular correspondence with the MOX support team later here an update on the situation:
The support was not helpful at all. Every few months I would get an email asking if the issue still exists, so I would reply with “yes it still exists” and provided renewed speed tests and my MOX logs, I factory reset the MOX multiple times, I tried minimal installations. After that nothing would happen and after a few months the cycle would repeat.
Last month there was a break through, I got asked to send my MOX in, so I did. But the result was really disappointing.
First the complaint report stated that there was an issue with the SDIO Wi-Fi (which was not the reason why I send it in).
Second the external antenna of the SDIO Wi-Fi were not connected.
And the WAN bottleneck still persists. Here my latest Iperf3 WAN tests:

Accepted connection from 192.168.2.20, port 57413
[ 5] local 192.168.2.30 port 5201 connected to 192.168.2.20 port 57414

[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.04 sec 402 MBytes 336 Mbits/sec receiver

To replicate the WAN test follow this guide https://www.tp-link.com/se/support/faq/2408/
You maybe have to set a firewall rule to allow your devices to talk to each other.

Tl;dr Support was not helpful, even when I send the MOX in. I’m still not able to get 1Gbit/s through WAN regardless of installed packages.

ordex · September 6, 2021, 10:45pm

Hi! @respi did you see any progress on this?
I just got a MOX (Classic + C) and I also can’t get beyond 450/470Mbps download.

The poor TP-Link I was using before the MOX was able to get me to 920/940Mbps steady.

My WAN is a PPPoE connection (with VLAN) while on the LAN side there is nothing special.
The test is being conducted from my PC (wired to the LAN port of the MOX) to the WAN (speedtest.net).

top does not show any process consuming CPU more than expected.

What I noted is that I have dropped packets in RX:

root@turris:~# ip -s l show pppoe-WAN
66: pppoe-WAN: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 3
    link/ppp 
    RX: bytes  packets  errors  dropped overrun mcast   
    4757055526 3576658  0       38662   0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    1090023033 1423352  0       0       0       0       
root@turris:~# ip -s l show lan3.835
65: lan3.835@lan3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1510 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d8:58:d7:00:cd:df brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    4785688022 3576847  0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    1121374079 1423552  0       0       0       0       
root@turris:~# ip -s l show lan3
6: lan3@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1510 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d8:58:d7:00:cd:df brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    11601792001 8641915  0       150242  0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    2526609006 3193342  0       0       0       0

Any idea where I could dig next?

edit: I am on TurrisOS 5.2.6

moeller0 · September 7, 2021, 9:55am

Could you please post the output of:

cat /proc/interrupts

ordex · September 7, 2021, 10:08am

sure:

CPU0       CPU1       
  3:    5970553    6489395     GICv3  30 Level     arch_timer
  6:          0          0     GICv3  23 Level     arm-pmu
  7:      10035          0     GICv3  32 Level     d0010600.spi
  8:        792          0     GICv3  33 Level     d0011000.i2c
  9:         70          0     GICv3  43 Level     serial
 10:   12884186          0     GICv3  74 Level     eth0
 11:    8272625          0     GICv3  77 Level     eth1
 12:          0          0     GICv3  35 Level     xhci-hcd:usb2
 13:          0          0     GICv3  49 Level     ehci_hcd:usb1
 20:         24          0     GICv3  50 Level     armada-37xx-rwtm-mailbox
 21:         52          0     GICv3  57 Level     mmc0
 22:       8596          0     GICv3  58 Level     mmc1
 40:          2          0     GICv3  79 Level     d0060900.xor
 41:          2          0     GICv3  80 Level     d0060900.xor
 42:         10          0     GPIO2   5 Edge      d0032004.mdio-mii:02
 50:          0          9  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
 53:          0          3  mv88e6xxx-g2   1 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@2!mdio:11
 54:          0          0  mv88e6xxx-g2   2 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@2!mdio:12
 55:          0          6  mv88e6xxx-g2   3 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@2!mdio:13
 56:          0          0  mv88e6xxx-g2   4 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@2!mdio:14
 67:          0          0  mv88e6xxx-g2  15 Edge      mv88e6xxx-watchdog
 68:          1          0     GPIO1  10 Edge      d00d8000.sdhci cd
 69:          0          0     GPIO2  20 Edge      gpio-keys
IPI0:    514109     517617       Rescheduling interrupts
IPI1:       155    2014842       Function call interrupts
IPI2:         0          0       CPU stop interrupts
IPI3:         0          0       CPU stop (for crash dump) interrupts
IPI4:         0          0       Timer broadcast interrupts
IPI5:   1064667    1150326       IRQ work interrupts
IPI6:         0          0       CPU wake-up interrupts
Err:          0

respi · September 7, 2021, 6:06pm

I got no further with my mox. It is only used an Wifi-Access-Point right now. After a new TOS release, I check the WAN speeds, but I saw no improvement.

Feel free to document your journey in the forum, it might help to solve this issue.

moeller0 · September 7, 2021, 7:22pm

There is quite a lot of interrupt processing on CPU0, maybe moving eth0 or eth1 onto CPU1 might help… but that will only work if your mox is saturating one CPU during your throughput tests.

ordex · September 8, 2021, 5:50am

@moeller0 The driver/kernel should take care of using either CPU without the user doing anything, no?
How would you suggest to “move eth0 or eth1 onto CPU1”?

@respi I reached out to support and I Was basically told that this is a known limitation at the moment. They are hoping that improvements will come with new kernel/driver along with new releases of TurrisOS.
I am gonna see if they have open tickets and what not to check them and potentially test.

tac2 · September 9, 2021, 6:47pm

MOX A+D
As I recently got my connection upgraded I made some simple tests with a couple of speed test services in the browser on my laptop.
My MOX is in router mode.
Results spans from about 700 to 980

wowbaggerHU · October 2, 2021, 6:14pm

Hello Everyone,

I found this topic via Google, and I seem to have the same problem:
I own a first-gen Turris Omnia, and now about five years later, I bought a MOX (Start+G+E+E+B) and I’m in the process of setting it up to take over for the Omnia.
At this point, both are connected in parallel to the same device(modem/gateway) provided the service provider.
The Omnia has a number of devices connected to it, and there I’m using a brand new laptop connected to it (the Omnia) via a Cat5e gigabit connection.
On the MOX, I have a single Thinkpad T470 hooked up, over a Cat6 gigabit connection.

If I do a speedtest (go to speedtest.net over a browser) on the Omnia, I get 936 Mbit/s down and 41 Mbit/s up, while on the MOX I only get 457 Mbit/s down and 41 Mbit/s up. In both cases I use the same Speedtest server.
At this point, I can only think that the only difference is that the hardware of the Omnia may still be more capable than that of the MOX.

Another difference is that MOX is running TurrisOS 5.2.7 and my Omnia has TurrisOS 3.11.23.