Turris limited network throughput

ales.zeleny · April 16, 2020, 10:38pm

Turris 1.0 slow down for me network throughput at least on download, 5 times.

As in past weeks, I’m working from home, I’ve decided to opt for a new internet provider with a 100/10 Mbs (download/upload) speeds.

Once the ISP finished their work I’ve connected my notebook using UTP to their LAN port and simply run a netmetr test . And voila the speed was what they advertised, nice.

Next step was connecting Turris WAN port instead of my notebook and run a netmetr test directly from turris:

BusyBox v1.29.3 () built-in shell (ash)

  _______  _    _  _____   _____   _____   _____
 |__   __|| |  | ||  __ \ |  __ \ |_   _| / ____|
    | |   | |  | || |__) || |__) |  | |  | (___
    | |   | |  | ||  _  / |  _  /   | |   \___ \
    | |   | |__| || | \ \ | | \ \  _| |_  ____) |
    |_|    \____/ |_|  \_\|_|  \_\|_____||_____/



root@turris:~# netmetr
Checking uuid on the control server...
Requesting test config from the control server...
Starting ping test...
ping_1_msec = 17.93
ping_2_msec = 14.60
ping_3_msec = 17.78
ping_4_msec = 17.93
ping_5_msec = 14.93
ping_6_msec = 19.14
ping_7_msec = 16.82
ping_8_msec = 19.61
ping_9_msec = 14.84
ping_10_msec = 17.53
Starting speed test...
==== rmbt v3.11-68-gfe363acb6-dirty ====
connecting...
connected with 3 flow(s) for dl; 3 flow(s) for ul
pretest downlink start... (min 1s)
pretest downlink end.
rtt_tcp_payload start... (11 times)
rtt_tcp_payload end.
downlink test start... (5s)
downlink test end.
pretest uplink start... (min 1s)
pretest uplink end.
uplink test start... (5s)
uplink test end.
disconnecting.
dl_throughput_mbps = 115.128205
ul_throughput_mbps = 11.756293
Exiting.

Exactly what I was expecting based on connectivity parameters from ISP.

The problem is, that once I’ve run a test from browser connected to my Turris using UTP cable not to be influenced by any WiFi potential issues, download speed was between 20-40Mbps.

The good news is, that I do have a Turris Mox (baseboard and F module for USB disks) as an ordinary client in my network.
The bad side is, that it confirmed the terrible speed limitation:

BusyBox v1.28.4 () built-in shell (ash)

      ______                _         ____  _____
     /_  __/_  ____________(_)____   / __ \/ ___/
      / / / / / / ___/ ___/ / ___/  / / / /\__
     / / / /_/ / /  / /  / (__  )  / /_/ /___/ /
    /_/  \__,_/_/  /_/  /_/____/   \____//____/

 -----------------------------------------------------
 TurrisOS 4.0.5, Turris Mox
 -----------------------------------------------------
root@mox:~# netmetr
Checking uuid on the control server...
Requesting test config from the control server...
Starting ping test...
ping_1_msec = 17.20
ping_2_msec = 43.30
ping_3_msec = 18.20
ping_4_msec = 19.10
ping_5_msec = 17.40
ping_6_msec = 19.20
ping_7_msec = 16.60
ping_8_msec = 21.40
ping_9_msec = 15.40
ping_10_msec = 15.40
Starting speed test...
==== rmbt ac255b554 ====
connecting...
connected with 3 flow(s) for dl; 3 flow(s) for ul
pretest downlink start... (min 1s)
pretest downlink end.
rtt_tcp_payload start... (11 times)
rtt_tcp_payload end.
downlink test start... (5s)
downlink test end.
pretest uplink start... (min 1s)
pretest uplink end.
uplink test start... (5s)
uplink test end.
disconnecting.
dl_throughput_mbps = 23.189040
ul_throughput_mbps = 12.447469
Exiting.

The previously used ISP connection was limited by 20/20 Mbps so I can’t say, whether the problem is new or not, neither I’m able to test higher uplink speed.

The only change I did on my Turris 1.0 configuration was to switch WAN port from a static address and DNS servers to DHCP, so I’d not expect this to be root cause for the speed issue.

My assumption is that such slow down from WAN to LAN is not expected, are there some steps I can do to identify the speed problem root cause and fix it?

Thanks for any advice.
Ales

JardaB · April 17, 2020, 6:01am

No one old SQM QoS settings ?

ales.zeleny · April 17, 2020, 7:19am

I’m not aware of any, how this can be checked/listed?

JardaB · April 17, 2020, 2:46pm

You’d have to know that, because it’s being installed manually.

ales.zeleny · April 17, 2020, 2:48pm

Than it is safe, that such settings aren’t in place.

viktor · April 17, 2020, 3:20pm

That’s not true. You can set SQM via Foris. Let’s check /etc/config/sqm.

anon50890781 · April 17, 2020, 4:40pm

Perhaps take a look at packet statistics for the various interfaces, might provide a clue.

ip -s l
tc -s q s
tc -s f s
tc -s c s

opkg whatdepends sqm-scripts

foris-controller-app depends on sqm-scripts

ales.zeleny · April 18, 2020, 12:12am

Here is the config:
root@turris:~# cat /etc/config/sqm

config queue 'eth1'
        option enabled '0'
        option interface 'eth1'
        option download '85000'
        option upload '10000'
        option qdisc 'fq_codel'
        option script 'simple.qos'
        option qdisc_advanced '0'
        option ingress_ecn 'ECN'
        option egress_ecn 'ECN'
        option qdisc_really_really_advanced '0'
        option itarget 'auto'
        option etarget 'auto'
        option linklayer 'none'

so it looks disabled.

ip -s l except Rx/Tx bytes and packest, all columns errors dropped overrun mcast contains only 0

root@turris:~# tc -s q s
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
 Sent 8707476 bytes 126092 pkt (dropped 0, overlimits 0 requeues 1)
 backlog 0b 0p requeues 1
  maxpacket 209 drop_overlimit 0 new_flow_count 63 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 1024p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
 Sent 40662585091 bytes 30199291 pkt (dropped 0, overlimits 0 requeues 2746)
 backlog 0b 0p requeues 2746
  maxpacket 1514 drop_overlimit 0 new_flow_count 37971 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth2 root refcnt 2 limit 1024p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
 Sent 12377890348 bytes 14751638 pkt (dropped 0, overlimits 0 requeues 156)
 backlog 0b 0p requeues 156
  maxpacket 44220 drop_overlimit 0 new_flow_count 79008 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev tun_turris root refcnt 2 limit 1024p flows 1024 quantum 1500 target 5.0ms interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0

root@turris:~# tc -s f s
root@turris:~# tc -s c s
root@turris:~# echo $?
0

The above listing I’m not able to interpret, so I hope it show something usefull.
Thanks Ales

anon50890781 · April 18, 2020, 9:06am

Is there a VPN tunnel involved however

?

ales.zeleny · April 19, 2020, 10:51am

VPN is installed, but from internal network I get IP addresses from DHCP from local lan scope, not from VPN, so I’d expect VPN is not involved, but I can de-install VPN for a test if it makes sense.
Thanks

anon50890781 · April 19, 2020, 10:53am

It is about routing (not DHCP). If the traffic generated by the router itself, e.g. with netmetr, is not routed through the VPN but the traffic from the T’s clients gets routed via the VPN it would explain the drop/discrepancy in bandwidth throughput.

ales.zeleny · April 19, 2020, 10:50pm

I’ve deleted the VPN, and the situation is still the same (from speed point of view):
turris:~# tc -s q s
qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 20088683 bytes 269796 pkt (dropped 0, overlimits 0 requeues 1)
backlog 0b 0p requeues 1
maxpacket 235 drop_overlimit 0 new_flow_count 250 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 1024p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 46889251307 bytes 38220888 pkt (dropped 0, overlimits 0 requeues 10076)
backlog 0b 0p requeues 10076
maxpacket 1514 drop_overlimit 0 new_flow_count 98274 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth2 root refcnt 2 limit 1024p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 14825410904 bytes 24087505 pkt (dropped 0, overlimits 0 requeues 334)
backlog 0b 0p requeues 334
maxpacket 44220 drop_overlimit 0 new_flow_count 205305 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

Is there (except factory reset) I can try to do to fix that?
Thanks Ales

Maxmilian_Picmaus · April 19, 2020, 11:36pm

ad_VPN: instead of de-install you can stop the daemon only (eventually disable it, so it stays stopped next bootup)

ad_factory_reset: before factory reset, try to check “schnapps” tool, maybe you have some backup/snapshot where the network was working fine/as used to

ad_WAN: if you set the dhcp instead of static, i would check the setup of WAN interface and disable propagation of DNS servers by your ISP and define some of your own (google, cloudflare, cz.nic…ones). (luci>network>interfaces>wan>advanced-option ; “Use DNS servers advertised by peer” and “Use custom DNS”). That should not be a root cause, but better to be sure you have preffered dns at use.

ad_generic: i would also check if the zones(lan,br-lan,wan/wan6) are configured fine (the in/out/fwd , masq … stuff) .

btw: are you using only ipv4 or also ipv6 ? adblock/pihole at use ? also are you having “miniupnp” active ? are you using sentinel/haas/ludus (or device detection) package(s)? any extra firewall rules applied? some extra setup for “virtual lan(s)”?

to debug/analyze/stress-test the network, you can use https://iperf.fr/ (on first look it seems complicated, on second look not so much – they have pretty nice guides (i’ve used some for samba/ssh/sftp tune up, some time ago) and i am network/routing newbie

Maxmilian_Picmaus · April 20, 2020, 3:26am

here are some results from my last testing. guide used: IPERF - The Easy Tutorial
with all services up , and with kind of

normal traffic

5devices in network, 3lxc , with active 4 users, openvpn on, but not active, running some downloads over https(50-300KB/s), irc network with two servers, two bots, 4 users and bouncer on top of it, two sftp servers, several static web sites for local net and two sites hosted publicly,samba,transmission , plex.tv(no active stream, +ludus,haas,sentinel,pihole
plus corporate laptop with active vpn (2 ssh sessions, 4 jdbc sessions, discord/skype/mail + all fancy corporate apps around) , i will test the speed from it later as well

iperf report

[ ID] Interval       Transfer     Bandwidth
[ windows10's ubuntu] cable
[  4]  0.0-10.0 sec  1.05 GBytes   904 Mbits/sec
[  4]  0.0-32.0 sec  3.34 GBytes   896 Mbits/sec
[  4]  0.0-32.0 sec  3.35 GBytes   898 Mbits/sec
[ from lxc container ] on router
[  4]  0.0-32.0 sec  5.81 GBytes  1.56 Gbits/sec
[ another shell/screen session ] router
[  4]  0.0- 3.7 sec   264 MBytes   596 Mbits/sec
[  4]  0.0- 5.9 sec   418 MBytes   598 Mbits/sec
[ from RPI/raspian device] wifi 
[  4]  0.0- 2.0 sec  24.5 MBytes   101 Mbits/sec
[  4]  0.0- 7.5 sec  93.5 MBytes   105 Mbits/sec

ad_ssh/sftp/vpn: i’ve notice that if compression is used, bandwidth is reduced ; also usinq TLS do the very similar.

anon50890781 · April 20, 2020, 8:00am

This counter value seems a little high but comparing to another node could be misleading, since:

it is a progressive counter and the node’s uptime may have an impact
kernel source code of net/sched/sch_generic.c (that generates the counter) varies with different kernel versions (T running on TOS3.x?)

If you want to keep an eye on the counter from time to time - tc -s q s | grep requeues

As both clients, the one with the browser and the M (assuming latter also connected by wire to the T), exhibit the same issue you could attempt debugging the matter with iperf instances, as mentioned by @Maxmilian_Picmaus, but start locally:

iperf instances on:
- T
- M
- other client (the one with the browser)

Then run the test between those local instances (bi-directional, i.e. reverse client and server role of the respective iperf instances ):

M ↔ T
other client ↔ T

If those turn out ok then the issue is likely with the packet forwarding (client → T → WAN) on the T.

moeller0 · April 20, 2020, 9:21am

Yes, this and the tc-output confirms that sqm/traffic shaping is not active.

ales.zeleny · April 21, 2020, 10:51pm

Thanks for the suggestions. Yes, all tests are on TP cable eth ports, no wifi.

Speed btween M and T is same in both directons, I’ve also did a test binding iperf sever on WAN port IP address and the spped was the same.

One of the clients I’ve used for testing is connected on very cable so it limits the test to 100Mbps, what if fine concerning the internet connection speed, but to make sure it is reproducible, I’ll do a test using notebook and cat5e or cat6 cable to verify the speed (I did it allready [963Mbps in both directions] but only between one of the devices and notebook, so having a test with brand new short cables might be better for check).

I also check the requeues before and after a speed test and all the figures remains same as they were before the speed test.

anon50890781 · April 22, 2020, 1:25pm

is the T still running TOS3.x or TOS > 4.x yet?
if still on TOS3.x would it be feasible to test with TOS4.x|5.x?
with the change of the ISP did the type of WAN connectivity change (e.g. ADSL to VDSL)?
did the ISP change the CPE (modem)?
what type/brand of CPE is deployed (e.g. modem only or some modem-router combo)?
if it is a modem-router combo is set to bridge mode?

JardaB · April 22, 2020, 3:29pm

Recap: (my apology if I haven’t read the previous one carefully)

the problem when you test download is not on routers ?
Manifests itself on clients connected via UTP - it is one client or has been tested by multiple clients in ropes ?
What web browser is used for test ? Isn’t it Opera and active VPN ?
Test in browser - what test is used what is the localization of the counterparty of the test — it is not the USA ?
No one specific message in log ?

ales.zeleny · April 27, 2020, 7:45pm

/etc/turris-version: 3.11.16
Upgrade - yes, but I have to re-configure all settings in that case (but can move to BTRFS with snapshots…)
Former ISP was 5GHz (wifi), new one is 5G
Yes, I was connected to a switch, now it is ISP router
modem-router is deployed, I’ve tested both NAT and bridge mode, same results