SQM and cake in TurrisOS 5.2.7 - far lower speed than the link

I have an asymmetric FTTH connection (1gbit DL / 210 Mbit UL) with which I’ve been using cake and SQM with TurrisOS 3.x. An unfortunate problem of this connection is that there’s a double NAT (ISP device → DMZ on the Omnia): I’m saying this to put the matter into context for what follows.

My SQM configuration is:

config queue 'eth1'
        option interface 'eth1'
        option ingress_ecn 'ECN'
        option egress_ecn 'ECN'
        option itarget 'auto'
        option etarget 'auto'
        option linklayer 'none'
        option enabled '1'
        option download '950000'
        option upload '190000'
        option debug_logging '0'
        option verbosity '5'
        option qdisc 'cake'
        option script 'piece_of_cake.qos'
        option qdisc_advanced '1'
        option squash_dscp '1'
        option squash_ingress '1'
        option qdisc_really_really_advanced '1'
        option iqdisc_opts 'nat dual-dsthost'
        option eqdisc_opts 'nat dual-srchost'

However, when testing both with dlsreports and librespeed, there is a huge drop in DL speed if I enable SQM:

  • Without SQM: DL ~900Mbps, UL 210Mbps
  • With SQM: DL ~190Mbps~500Mbps, UL 190 Mbps

As far as I remember, this did not happen with TurrisOS 3.x (but I might’ve been misremembering things).

Is there something wrong in this configuration? Pinging the SQM expert @moeller0 here.

Mmmh, so in my testing with TOS4 my omnia alllowed for bi-directional traffic shaping up to 550/550, and other’s reported that unidirectional shaping works up to ~1Gbps. But this only works if the omnia is not trying t process everything on a single CPU and if not too much else is eating up CPU cycles, especially the PaKon feature is quite CPU intensive (as is running containers for other purposes)
Under https://openwrt.org/docs/guide-user/network/traffic-shaping/sqm-details in the FAQ there is a description how to use htop to monitor CPU usage in an interactive way. Maybe the best start would be for you to run a speedtest, while looking at the htop output in a terminal window. The trick is to look at the idle percentage (or the black part in the colored display) if that falls to low for one of the CPU’s or both you have a problem. Cake does not require loads of CPU throughput, but it needs access to the CPU with fairly short deadlines, if it does not get that the symptoms are both more latency-increase-under-load (aka bufferbloat) and less throughput.

Sidenote, you really should also configure the per-packet-overhead to something sane, but that is orthogonal to your issue.

Also please post the following:

  1. disable SQM:
    `/etc/init.d/sqm stop’

  2. Run a dslreports and/or waveform speedtest performed with SQM disabled. Please post links to the detailed results (for waveform use the “Share your Results” link in the results page).

  3. Enable SQM:
    `/etc/init.d/sqm start’

  4. ifstatus wan

  5. tc -s qdisc

  6. run and post a speedtest again, like in 1) but this time with SQM enabled

  7. ‘tc -s qdisc’

There’s some significant CPU usage in softirqs when the test is running with SQM enabled (up to 40%). Not sure if it’s related or not. The rest is mostly idle.

First: PEBKAC, though, because the WAN interface changed in 5.x, it is now eth2 instead of eth1. After changing that it is better, but I still go down to ~500Mbps from ~900.

Reports with SQM disabled : Bufferbloat Test by Waveform
Reports with SQM enabled: Bufferbloat Test by Waveform

ifstatus wan:

{
	"up": true,
	"pending": false,
	"available": true,
	"autostart": true,
	"dynamic": false,
	"uptime": 10069,
	"l3_device": "eth2",
	"proto": "dhcp",
	"device": "eth2",
	"updated": [
		"addresses",
		"routes",
		"data"
	],
	"metric": 0,
	"dns_metric": 0,
	"delegation": true,
	"ipv4-address": [
		{
			"address": "192.168.1.138",
			"mask": 24
		}
	],
	"ipv6-address": [
		
	],
	"ipv6-prefix": [
		
	],
	"ipv6-prefix-assignment": [
		
	],
	"route": [
		{
			"target": "0.0.0.0",
			"mask": 0,
			"nexthop": "192.168.1.254",
			"source": "192.168.1.138/32"
		}
	],
	"dns-server": [
		"192.168.1.254"
	],
	"dns-search": [
		
	],
	"neighbors": [
		
	],
	"inactive": {
		"ipv4-address": [
			
		],
		"ipv6-address": [
			
		],
		"route": [
			
		],
		"dns-server": [
			
		],
		"dns-search": [
			
		],
		"neighbors": [
			
		]
	},
	"data": {
		"leasetime": 86400
	}
}

tc -s qdisc:

qdisc noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc mq 0: dev eth0 root 
 Sent 826 bytes 7 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 parent :8 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :7 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :6 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :5 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth0 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 826 bytes 7 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc mq 0: dev eth1 root 
 Sent 9399397671 bytes 9158632 pkt (dropped 1, overlimits 0 requeues 14832) 
 backlog 0b 0p requeues 14832
qdisc fq_codel 0: dev eth1 parent :8 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth1 parent :7 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth1 parent :6 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth1 parent :5 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth1 parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth1 parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth1 parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth1 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 9399397671 bytes 9158632 pkt (dropped 1, overlimits 0 requeues 14832) 
 backlog 0b 0p requeues 14832
  maxpacket 27396 drop_overlimit 0 new_flow_count 44604 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc cake 801d: dev eth2 root refcnt 9 bandwidth 190Mbit besteffort dual-srchost nat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0 
 Sent 738197840 bytes 1042889 pkt (dropped 239, overlimits 1428895 requeues 44) 
 backlog 0b 0p requeues 44
 memory used: 365056b of 9500000b
 capacity estimate: 190Mbit
 min/max network layer size:           42 /    1514
 min/max overhead-adjusted size:       42 /    1514
 average network hdr offset:           14

                  Tin 0
  thresh        190Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay         44us
  av_delay         31us
  sp_delay          1us
  backlog            0b
  pkts          1043128
  bytes       738545346
  way_inds         1122
  way_miss          170
  way_cols            0
  drops             239
  marks               0
  ack_drop            0
  sp_flows            6
  bk_flows            1
  un_flows            0
  max_len          1514
  quantum          1514

qdisc ingress ffff: dev eth2 parent ffff:fff1 ---------------- 
 Sent 2455180997 bytes 1861059 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev lan0 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev lan1 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev lan2 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev lan3 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev lan4 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev br-lan root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wg0 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wg1 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc cake 801e: dev ifb4eth2 root refcnt 2 bandwidth 9500Mbit besteffort dual-dsthost nat wash no-ack-filter split-gso rtt 100.0ms raw overhead 0 
 Sent 2460654766 bytes 1821196 pkt (dropped 39863, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 6998092b of 15140Kb
 capacity estimate: 9500Mbit
 min/max network layer size:           60 /    1514
 min/max overhead-adjusted size:       60 /    1514
 average network hdr offset:           14

                  Tin 0
  thresh       9500Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay         10us
  av_delay          3us
  sp_delay          2us
  backlog            0b
  pkts          1861059
  bytes      2520990591
  way_inds         1304
  way_miss          167
  way_cols            0
  drops           39863
  marks               0
  ack_drop            0
  sp_flows            5
  bk_flows            1
  un_flows            0
  max_len         60560
  quantum          1514

Excellent, you already figured out the take-home message from that exercise :wink:

qdisc cake 801e: dev ifb4eth2 root refcnt 2 bandwidth 9500Mbit besteffort dual-dsthost nat wash no-ack-filter split-gso rtt 100.0ms raw overhead 0

9500Mbit looks like this is a 0 too much, no?

overhead 0: this essentially means the kernels overhead of 14 is used, but that is in all likelihood too little, but as before this is a secondary problem.

Now, you get a goodput of 591.6, which expands to (assuming TCP/IPv4, without options):

591.6 * ((1500+14)/(1500-20-20)) = 613.48 Mbps gross shaper rate…

As I said before that is about what I got with my omnia in bi-directional shaping, but with uni-directional shaping I would expect more.

It is possible that too much processing happens on one of CPUs while tp other idles, so please post the output of:

cat /proc/interrupts 

which should show how many interrupts (sorted by number) are processed on which CPU.

I would guess that OpenWrt default SMP policy might hurt a bit, so let’s check that:

Please paste to following “test” into a shell on your omnia:

for file in /sys/class/net/*
do
echo $file RX rps_cpus
cat $file"/queues/rx-0/rps_cpus"
echo $file TX xps_cpus
cat $file"/queues/tx-0/xps_cpus"
done

I expect you will mostly see 0 or 2. If so run the following and repeat a speedtest (while monitoring the load):

for file in /sys/class/net/*
do
echo 3 > $file"/queues/rx-0/rps_cpus"
echo 3 > $file"/queues/tx-0/xps_cpus"
done

After this the “test” from before should give you 3s…

/proc/interrupts:

           CPU0       CPU1       
 17:          0          0     GIC-0  27 Edge      gt
 18:    6499231   12416815     GIC-0  29 Edge      twd
 19:          0          0      MPIC   5 Level     armada_370_xp_per_cpu_tick
 20:          0          0      MPIC   3 Level     arm-pmu
 21:    1613273          0     GIC-0  34 Level     mv64xxx_i2c
 22:         19          0     GIC-0  44 Level     ttyS0
 37:          7          0      MPIC   8 Level     eth0
 38:   23251031          0      MPIC  10 Level     eth1
 39:   23004770          0      MPIC  12 Level     eth2
 40:          0          0     GIC-0  50 Level     ehci_hcd:usb1
 41:          0          0     GIC-0  51 Level     f1090000.crypto
 42:          0          0     GIC-0  52 Level     f1090000.crypto
 43:          0          0     GIC-0  53 Level     f10a3800.rtc
 44:          0          0     GIC-0  58 Level     ahci-mvebu[f10a8000.sata]
 45:     259167          0     GIC-0  57 Level     mmc0
 46:          0          0     GIC-0  48 Level     xhci-hcd:usb2
 47:          0          0     GIC-0  49 Level     xhci-hcd:usb4
 49:          2          0     GIC-0  54 Level     f1060800.xor
 50:          2          0     GIC-0  97 Level     f1060900.xor
 58:         16         26  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
 60:          9          2  mv88e6xxx-g2   0 Edge      mv88e6xxx-1:00
 61:          4          0  mv88e6xxx-g2   1 Edge      mv88e6xxx-1:01
 62:          7         24  mv88e6xxx-g2   2 Edge      mv88e6xxx-1:02
 63:          0          0  mv88e6xxx-g2   3 Edge      mv88e6xxx-1:03
 64:          0          0  mv88e6xxx-g2   4 Edge      mv88e6xxx-1:04
 75:          0          0  mv88e6xxx-g2  15 Edge      mv88e6xxx-watchdog
 76:          1          0  f1018140.gpio  14 Level     8-0071
 78:         31          0  MPIC MSI 1048576 Edge      ath10k_pci
 79:          0          0     GIC-0  61 Level     ath9k
IPI0:          0          1  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:    1541885     825811  Rescheduling interrupts
IPI3:        106    9086606  Function call interrupts
IPI4:          0          0  CPU stop interrupts
IPI5:          0          0  IRQ work interrupts
IPI6:          0          0  completion interrupts
Err:          0

Please paste to following “test” into a shell on your omnia:

As you expected, there are only 0s and 2s.

If so run the following and repeat a speedtest (while monitoring the load):

Performance is worse (450 Mbps) and the softirq load is significant, up to 50% of CPU1 in si.

1 Like

Thanks, that is less helpful than I hoped… Could be that there is/was a kernel change that somehow reduced the achievable shaper rate significantly.

Yes, that is the normal default for OpenWrt 19.07, but for the omnia that used to be sub-optimal, because all ethernet IRQs are processed on CPU0 and then both shapers ended up on CPU1 which is problematic, setting this to 3 should actually help… I wonder do you have irqbalance installed and active?

That is unexpected… do you have anything else installed on your omnia which might intefere with these tests, like PaKon or Netmetr speedtests, or anything else computationally intensive like containers?

I wonder do you have irqbalance installed and active?

irqbalance is not running nor it is installed AFAICS.

like PaKon or Netmetr speedtests, or anything else computationally intensive like containers?

No pakon or Netmetr nor containers, perhaps just collectd which is sending stuff elsewhere though (RRDtool is disabled). The only slight CPU usage at baseline is foris-controller.

Hm, it looks suspiciously related to Bufferbloat & SQM & Omnia - #24 by tonyquan

I’ll look at the changes there.