Very bad ping times

Last week my router started to report a read only file system.
So I did the recommended thing and reinstalled the OS.
Turns out that is now version 5.1.4 and I can’t reinstall my saved config.
That was annoying but not really a big problem.

After the update I experience some really horrible ping times.

Setup:
Laptop directly connected to router by cable.

Ping to google when logged into the router: steady 1.7ms

Ping to google from laptop:

64 bytes from 216.58.211.3: icmp_seq=955 ttl=118 time=2.345 ms
64 bytes from 216.58.211.3: icmp_seq=956 ttl=118 time=23.314 ms
64 bytes from 216.58.211.3: icmp_seq=957 ttl=118 time=4.007 ms
64 bytes from 216.58.211.3: icmp_seq=958 ttl=118 time=2.408 ms
64 bytes from 216.58.211.3: icmp_seq=959 ttl=118 time=2.359 ms
64 bytes from 216.58.211.3: icmp_seq=960 ttl=118 time=113.405 ms
Request timeout for icmp_seq 961
Request timeout for icmp_seq 962
64 bytes from 216.58.211.3: icmp_seq=961 ttl=118 time=2034.436 ms
64 bytes from 216.58.211.3: icmp_seq=962 ttl=118 time=2839.125 ms
64 bytes from 216.58.211.3: icmp_seq=963 ttl=118 time=2112.842 ms
64 bytes from 216.58.211.3: icmp_seq=965 ttl=118 time=413.990 ms
64 bytes from 216.58.211.3: icmp_seq=966 ttl=118 time=6.050 ms
64 bytes from 216.58.211.3: icmp_seq=967 ttl=118 time=2.427 ms
64 bytes from 216.58.211.3: icmp_seq=968 ttl=118 time=2.631 ms
64 bytes from 216.58.211.3: icmp_seq=969 ttl=118 time=46.485 ms
64 bytes from 216.58.211.3: icmp_seq=970 ttl=118 time=2.463 ms
64 bytes from 216.58.211.3: icmp_seq=971 ttl=118 time=15.688 ms

This is not really acceptable.

I included eth0 in the lan interface. That helped somewhat. Before that I would have intermediate pings of serveral seconds.

Any idea what could cause this?

Did not have the problem with 3.X version of the OS.

is that the same IP as from laptop? at the same time?
tried running traceroute to google and pinging topologically closer IP (ideally first behind router)?

Yes, the same ip address was used.
Just for fun I tries the gateway address of my local LAN interface

64 bytes from 192.168.1.1: icmp_seq=11 ttl=64 time=0.669 ms
64 bytes from 192.168.1.1: icmp_seq=12 ttl=64 time=5.728 ms
64 bytes from 192.168.1.1: icmp_seq=13 ttl=64 time=0.604 ms
64 bytes from 192.168.1.1: icmp_seq=14 ttl=64 time=0.586 ms
64 bytes from 192.168.1.1: icmp_seq=15 ttl=64 time=16.593 ms
64 bytes from 192.168.1.1: icmp_seq=16 ttl=64 time=46.604 ms
64 bytes from 192.168.1.1: icmp_seq=17 ttl=64 time=12.207 ms
64 bytes from 192.168.1.1: icmp_seq=18 ttl=64 time=13.299 ms
64 bytes from 192.168.1.1: icmp_seq=19 ttl=64 time=4.666 ms
64 bytes from 192.168.1.1: icmp_seq=20 ttl=64 time=0.572 ms
64 bytes from 192.168.1.1: icmp_seq=21 ttl=64 time=0.588 ms
64 bytes from 192.168.1.1: icmp_seq=22 ttl=64 time=8.235 ms
64 bytes from 192.168.1.1: icmp_seq=23 ttl=64 time=18.756 ms

I really have no idea what is going wrong here.

Isn’t your emmc dying? What does dmesg | grep -i btrfs and grep -i btrfs /var/log/messages say?

It didn’t say anything when I first tried( neither dmesg or messages). So I restarted the router and got this from dmesg and messages

[    0.000000] Kernel command line: earlyprintk console=ttyS0,115200 rootfstype=btrfs rootdelay=2 root=b301 rootflags=subvol=@,commit=5 rw cfg80211.freg=**
[    4.861774] Btrfs loaded, crc32c=crc32c-generic
[    8.465163] BTRFS: device fsid 9e3eb291-3a37-4e15-9d76-153bec1fe6cc devid 1 transid 901 /dev/root
[    8.474686] BTRFS info (device mmcblk0p1): disk space caching is enabled
[    8.481412] BTRFS info (device mmcblk0p1): has skinny extents
[    8.491718] BTRFS info (device mmcblk0p1): enabling ssd optimizations
[    8.500403] VFS: Mounted root (btrfs filesystem) on device 0:12.
[   12.193416] BTRFS info (device mmcblk0p1): disk space caching is enabled

Okay, so fortunately it seems it isn’t a damaged filesystem…

To resolve your issue, I’d continue to connect to the router via a different RJ45 port on the router, maybe changing the cable… And what if you connect to the router via wifi?

I tried all that without any change at all.
Actually he router was working fine (with a broken file system) before I reflashed with OS 5.1.4.
Is there actually the possibility to reflash to OS3.XX ?

I tried all router ports. Same issue on all ports.
Wifi - same issue, slightly worse even.

Today I tried to ping my laptop from the router.

64 bytes from 192.168.1.220: icmp_req=258 ttl=64 time=0.475 ms
64 bytes from 192.168.1.220: icmp_req=259 ttl=64 time=0.606 ms
^C
--- 192.168.1.220 ping statistics ---
259 packets transmitted, 259 received, 0% packet loss, time 268174ms
rtt min/avg/max/mdev = 0.449/0.708/1.067/0.114 ms

As you can see, that worked as expected.

Is it fair tp asume that something is wrong with the firewall? (I haven’t done any editing to the firewall, only standard rules are in place.) Is there anything that I can disable for testing purposes?

Hello @wisser,
if I followed your debug correctly it was these tests:

  1. ping google from laptop (omnia as router) —> bad ping
  2. ping router from laptop —> bad ping
  3. ping laptop from router —> bad ping
  4. ping google from router —> ping ok?

Are you sure that the problem is the router and not your laptop? Could you:

  1. Plug internet cable directly to the laptop (not connecting to the turris router).
  2. Try another device? Maybe a mobile phone if you no other computer
  1. ping from router to laptop is below one ms.
  2. ping from laptop to router between 1ms and 4000ms.

I have bad ping times from all other computers on my network. (My son is constantly complaining because he can’t play online games).
I tried direct cable, too, no change.

Everything worked fine before upgrading to OS 5.14.

If the problem persisted system reflash and is not port-specific, I’d suspect the problem to lie outside the router. Can’t you have a second DHCP server running in the network or something like that? Or something that sends malformed ARP responses?

What if you disconnect everything from the router (including WAN cable) and then connect only your laptop? And can you post the output of ip route from both the router and the laptop?

I did note have any ping time problems before the reflash. The reflash was only done because the router complained about the read only file system. But my network worked really well before.

  • I followed your advise and tested the router with the laptop only. This gave me excellent ping times.
  • I reconnected my network, still excellent ping times
  • powered up wifi, excellent ping times
  • pluged in WAN, bad ping times (between laptop and router!!!)
  • unplugged everything
  • router no wifi, only laptop connected, excellent ping times, plug in WAN, bad ping times again.
    So it seems something on the WAN side is causing this.

I have a 1Gbit fiber connection. Which is converted to ethernet and then plugged into the RJ45 on the Turris Omnia. On that cable there is TV on some vlan, phone on some vlan and internet (no vlan).

Ohh, before I forget: Thank you for your help and patience. I really appreciate it!

Do you run PAKON or any other computationally expensive services on the router, by chance?

Most probably! What should I disable?

Can’t you (by chance) use the same subnet addresses for you local network as your ISP uses for you WAN?

Unbeliebable as it is, my ISP actually uses public ip addresses.

But it seems moeller0 hint helped a lot. I deinstalled pakon, that didn’t help. But since I deinstalled collectd ping times are perfect!

A big thank you to both of you!

Great you solved the problem! Now the question remains why collectd affects the ping times, but that’s another issue and depends on you having the time and will to debug it :slight_smile: If you decide to pursue it, consider sending diagnostics to the tech support email, they could figure out something.