Netlink issue: "ip route show" hangs (same thing for many network programs)

bortzmeyer · July 2, 2018, 8:38am

Since the last upgrade, the LXC containers of my Turris Omnia has problems with many (all) network programs. It takes 20 to 40 seconds even for “ping -c 1 127.0.0.1”

% time ping -c 1 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.063 ms

--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.063/0.063/0.063/0.000 ms
ping -c 1 127.0.0.1  0.00s user 0.00s system 0% cpu 28.573 total

Running with strace -r, I see that ping hangs on the exit() system call:

 0.000086 write(1, "rtt min/avg/max/mdev = 0.293/0.2"..., 50rtt min/avg/max/mdev = 0.293/0.293/0.293/0.000 ms
) = 50
 0.000236 exit_group(0)             = ?
13.016139 +++ exited with 0 +++

Other programs such as ssh hangs on a sendto call:

0.000062 sendto(3, {{len=20, type=0x16 /* NLMSG_??? */, flags=NLM_F_REQUEST|0x300, seq=1530520638, pid=0}, "\x00\x00\x00\x00"}, 20, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12
 2.999693 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=20, type=NLMSG_DONE, flags=NLM_F_MULTI, seq=1530520661, pid=26088}, 0}, iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20

It sems a Netlink problem. I see that Netlink on the Turris itself also has a problem:

root@turris:~# time ip route show
default via 82.251.62.254 dev eth1  proto static  src 82.251.62.29 
82.251.62.0/24 dev eth1  proto kernel  scope link  src 82.251.62.29 
82.251.62.254 dev eth1  proto static  scope link  src 82.251.62.29 
192.168.2.0/24 dev br-lan  proto kernel  scope link  src 192.168.2.254 
real	0m 18.47s
user	0m 0.00s
sys	0m 0.00s

Why does it take so much time for Netlink route requests?
Is it the reason why my network programs hang? (On the Turris itself, network programs work fine, only the LXC containers have problems)

bortzmeyer · July 2, 2018, 8:42am

On the Turris :

root@turris:~# uname -a
Linux turris 4.4.138-1e8e1b4c23f383e990eb3c4f490f5f2e-1 #1 SMP Tue Jun 26 07:54:39 CEST 2018 armv7l n
 
root@turris:~# cat /etc/turris-version 
3.10.3

bortzmeyer · July 2, 2018, 8:15pm

Rebooting cured the problem.

root@turris:~# time ip route show
default via 82.251.62.254 dev eth1  proto static  src 82.251.62.29 
82.251.62.0/24 dev eth1  proto kernel  scope link  src 82.251.62.29 
82.251.62.254 dev eth1  proto static  scope link  src 82.251.62.29 
192.168.2.0/24 dev br-lan  proto kernel  scope link  src 192.168.2.254 
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s