Router reboots caused by memory fragmentation


#1

I had a problems with spontaneous router reboots every 1-3 days since we changed an ISP. Almost every time before reboot the internet connection dropped. Once with some luck, I was able to read the router logs shortly before the reboot happened. I found this very interesting entry:

2018-03-04T02:55:04+01:00 warning kernel[]: [86429.609956] netifd: page allocation failure: order:5, mode:0x24000c0

This means that netifd process requested memory allocation (if I am not wrong order 5 means 128kB) but this request failed. As the system had lots of free memory:

Normal free:17796kB min:3504kB low:4380kB high:5256kB active_anon:26476kB inactive_anon:28948kB active_file:271888kB inactive_file:258968kB .…

the reason is NOT memory exhaustion. The only candidate here can be memory fragmentation. Flowing log entries proves this hypothesis:

2018-03-04T02:55:04+01:00 warning kernel[]: [86429.610316] Normal: 1176*4kB (UME) 576*8kB (UME) 412*16kB (UME) 50*32kB (UM) 6*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17888kB
2018-03-04T02:55:04+01:00 warning kernel[]: [86429.610337] HighMem: 98*4kB (UM) 43*8kB (UM) 8*16kB (UM) 6*32kB (UM) 3*64kB (M) 5*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1888kB

I find out that there was similar issue on arch linux for ARM https://github.com/archlinuxarm/PKGBUILDs/pull/630 and the solution is to enable flowing kernel compile options:

CONFIG_COMPACTION=y
CONFIG_SLUB=y

I have recompiled and installed kernel with this options and my router is running stable for months!

Initially I have informed support about the issue (ticket #001486), and later as I found the solution I have requested kernel change with this options. But unfortunately I did not received any reply from Turris support to my request.

To everybody with self-rebooting Turris, this small patch can hopefully help you. Send an email to support with link to this post and request to this small kernel change.


#2

Good job! But one question remains. You said the change-point was ISP change. I don’t have any idea how new ISP can trigger problems you mentioned.
What is connection type / config of ISPs before and after?


#3

Good catch! No idea why OpenWRT has those disabled by default, enabled them now in nightly and unless it breaks something else, it will be part of the next release.


#4

My previous ISP was using cable connection. The cable modem simply exposed external IP on ethernet interface (very useful) so Turris was used as a simple router without need to dial anyting. Then we was moving and I did not used Turris for 2-3 months. New provider is using VDSL so Turris is configured to dial PPPoE over VDSL modem now.

I don’t think the ISP change triggered this problem, rather some software update during the offline time. But I am not sure and cannot prove anything.


#5

Thank you for accepting the proposed options. I am happy to hear that it will be part of standard kernel (if everything goes right).