Random kernel crashes on mvneta [HBS/HBL]

Hi I am experiencing random OOM killings crashes. I use HBL branch on TOS 7.0 now. I tried to look on the internet for general info on how to debug OOM crash and I have found only workarounds by clearing cache and stuff not relevant. How may I dig deeper to the real cause?

By now I don’t use many applications that are memory hungry. In normal operating state I have

root@router:~# free
              total        used        free      shared  buff/cache   availab
Mem:        2063488      354976      641040       83732     1067472     14991
Swap:      12582908           0    12582908

I will post a log from dmesg when it happens again. But I couldnt find a pattern. Sometimes its every two days sometimes two weeks uptime and it simply works. I also posted in:
Unprovoked OOM mass killing because it was the only thing that was related to the problem in the meantime but MGLRU should be disabled 2 months ago and I still experience OOM’s?

I know HBL is not supported by Turris Support so I ask Community.

P.S. I forgot to mention that I use Omnia.

That shows almost no available memory. Oh wait, you cropped the output from 1.5G available, so that just confused me.

I installed collectd recently to be able to monitor free memory but it seems consistant nothing extraordinary here

Just after reboots the cache is getting filled up slowly but I guess thats normal to use RAM that you have available

I’m going a bit off topic, but available is the interesting metric and the graph above isn’t very useful IMO. For example, /tmp(fs) memory will be counted into cached, but it’s not available for use (though if you have swap, it can get swapped like normal used memory).

So filling out /tmp with this amount of swap seems impossible. What might be the other causes?

Edit:

root@router:~# du -sh /tmp
83.1M   /tmp

What does dmesg say when this happens? OOM Killer usually output useful messages.

Its easier for me to restore uplink just by rebooting. But indeed there was some messages in dmesg. Next it will blow up I will save dmesg and post it here.

So this is what I get in dmesg:

dmesg
[65595.213964] CPU: 1 PID: 3080 Comm: netifd Not tainted 5.15.96 #0
[65595.220000] Hardware name: Marvell Armada 380/385 (Device Tree)
[65595.225936] [<c010e804>] (unwind_backtrace) from [<c010a46c>] (show_stack+0x10/0x14)
[65595.233710] [<c010a46c>] (show_stack) from [<c05f0f38>] (dump_stack_lvl+0x40/0x4c)
[65595.241309] [<c05f0f38>] (dump_stack_lvl) from [<c0282840>] (warn_alloc+0xec/0x188)
[65595.248993] [<c0282840>] (warn_alloc) from [<c02837a4>] (__alloc_pages+0xec8/0xf90)
[65595.256671] [<c02837a4>] (__alloc_pages) from [<c0113d00>] (__dma_alloc_buffer.constprop.0+0x34/0x1f8)
[65595.266004] [<c0113d00>] (__dma_alloc_buffer.constprop.0) from [<c0113ed8>] (simple_allocator_alloc+0x14/0x28)
[65595.276032] [<c0113ed8>] (simple_allocator_alloc) from [<c01136b8>] (__dma_alloc+0x11c/0x29c)
[65595.284581] [<c01136b8>] (__dma_alloc) from [<c011386c>] (arm_coherent_dma_alloc+0x34/0x3c)
[65595.292955] [<c011386c>] (arm_coherent_dma_alloc) from [<c01967c0>] (dma_alloc_attrs+0x10c/0x118)
[65595.301857] [<c01967c0>] (dma_alloc_attrs) from [<c078f468>] (mvneta_setup_txqs+0xec/0x278)
[65595.310234] [<c078f468>] (mvneta_setup_txqs) from [<c078f664>] (mvneta_open+0x70/0x2ec)
[65595.318261] [<c078f664>] (mvneta_open) from [<c0862564>] (__dev_open+0xd4/0x1b8)
[65595.325684] [<c0862564>] (__dev_open) from [<c0862a18>] (__dev_change_flags+0x194/0x21c)
[65595.333800] [<c0862a18>] (__dev_change_flags) from [<c0862ab4>] (dev_change_flags+0x14/0x44)
[65595.342264] [<c0862ab4>] (dev_change_flags) from [<c088c71c>] (dev_ioctl+0x120/0x65c)
[65595.350127] [<c088c71c>] (dev_ioctl) from [<c0834b4c>] (sock_ioctl+0x33c/0x588)
[65595.357467] [<c0834b4c>] (sock_ioctl) from [<c02be07c>] (sys_ioctl+0x564/0xc74)
[65595.364808] [<c02be07c>] (sys_ioctl) from [<c0100060>] (ret_fast_syscall+0x0/0x48)
[65595.372402] Exception stack(0xc5161fa8 to 0xc5161ff0)
[65595.377468] 1fa0:                   beeb5ccc 00000000 00000008 00008914 beeb5ccc beeb5cc0
[65595.385666] 1fc0: beeb5ccc 00000000 beeb5ccc 00000036 00000008 b6ef1018 0000201d 00000001
[65595.393863] 1fe0: 004dbcd4 beeb5b90 004be6b0 b6f271b4
[65595.399005] Mem-Info:
[65595.401297] active_anon:12860 inactive_anon:15184 isolated_anon:0
[65595.401297]  active_file:154181 inactive_file:248510 isolated_file:0
[65595.401297]  unevictable:4 dirty:8 writeback:0
[65595.401297]  slab_reclaimable:26319 slab_unreclaimable:15639
[65595.401297]  mapped:7895 shmem:5765 pagetables:582 bounce:0
[65595.401297]  kernel_misc_reclaimable:0
[65595.401297]  free:21103 free_pcp:179 free_cma:0
[65595.437732] Node 0 active_anon:51440kB inactive_anon:60232kB active_file:616724kB inactive_file:994040kB unevictable:16kB isolated(anon):0kB isolated(file):0kB mapped:31580kB dirty:32kB writeback:0kB shmem:23060kB writeback_tmp:0kB kernel_stack:2248kB pagetables:2328kB all_unreclaimable? no
[65595.463542] Normal free:74060kB min:16384kB low:20480kB high:24576kB reserved_highatomic:0KB active_anon:112kB inactive_anon:4428kB active_file:181044kB inactive_file:255092kB unevictable:0kB writepending:4kB present:786432kB managed:752768kB mlocked:0kB bounce:0kB free_pcp:464kB local_pcp:0kB free_cma:0kB
[65595.490756] lowmem_reserve[]: 0 10240 10240
[65595.494962] HighMem free:10352kB min:512kB low:7640kB high:14768kB reserved_highatomic:0KB active_anon:51244kB inactive_anon:55024kB active_file:435312kB inactive_file:738944kB unevictable:16kB writepending:28kB present:1310720kB managed:1310720kB mlocked:16kB bounce:0kB free_pcp:2308kB local_pcp:0kB free_cma:0kB
[65595.522776] lowmem_reserve[]: 0 0 0
[65595.526305] Normal: 3403*4kB (UME) 2390*8kB (UME) 1012*16kB (UME) 450*32kB (UME) 124*64kB (UM) 19*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 73692kB
[65595.540835] HighMem: 2678*4kB (UM) 257*8kB (UM) 24*16kB (UM) 11*32kB (U) 5*64kB (U) 5*128kB (U) 5*256kB (U) 3*512kB (UM) 1*1024kB (M) 1*2048kB (M) 0*4096kB = 20352kB
[65595.555713] 410230 total pagecache pages
[65595.559660] 1077 pages in swap cache
[65595.563246] Swap cache stats: add 72711, delete 71651, find 9310/12956
[65595.569811] Free swap  = 12374524kB
[65595.573310] Total swap = 12582908kB
[65595.576809] 524288 pages RAM
[65595.579712] 327680 pages HighMem/MovableOnly
[65595.583997] 8416 pages reserved
[65595.587150] mvneta f1034000.ethernet eth2: mvneta_setup_txqs: can't create txq=7
[65596.700601] mvneta f1034000.ethernet eth2: mvneta_setup_txqs: can't create txq=4
[65596.708130] ------------[ cut here ]------------
[65596.712821] WARNING: CPU: 1 PID: 3080 at arch/arm/mm/dma-mapping.c:818 __arm_dma_free.constprop.0+0x144/0x1c4
[65596.722802] Freeing invalid buffer ef85f1e0
[65596.727000] Modules linked in: ath9k ath9k_common xt_connlimit qcserial pppoe ppp_async nf_conncount iptable_nat ath9k_hw ath10k_pci ath10k_core ath xt_state xt_nat xt_helper xt_conntrack xt_connmark xt_connbytes xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD xt_CT usb_wwan pppox ppp_generic nft_redir nft_ct nf_nat_ftp nf_nat nf_flow_table nf_conntrack_netlink nf_conntrack_ftp nf_conntrack mt7915e mt76_connac_lib mt76 mac80211 lz4 ipt_REJECT ebtable_nat ebtable_filter ebtable_broute cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_recent xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_NFQUEUE xt_NFLOG xt_LOG xt_HL xt_DSCP xt_CLASSIFY usbserial ums_usbat ums_sddr55 ums_sddr09 ums_karma ums_jumpshot ums_isd200 ums_freecom ums_datafab ums_cypress ums_alauda slhc sch_cake nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_quota nft_objref nft_numgen nft_log nft_limit nft_hash nft_counter nft_compat nfnetlink_queue nfnetlink_log nf_tables
[65596.727162]  nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 macvlan lz4_decompress lz4_compress iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables ebtables ebt_vlan ebt_stp ebt_redirect ebt_pkttype ebt_mark_m ebt_mark ebt_limit ebt_among ebt_802_3 crc_ccitt compat br_netfilter at24 fuse sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact configs i2c_mux_pca954x i2c_mux ledtrig_oneshot ledtrig_gpio cryptodev xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ipmac ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 nfsv4 nfsd nfs msdos ifb sit ip6_tunnel netlink_diag tunnel6 tunnel4
[65596.814927]  ip_tunnel rpcsec_gss_krb5 auth_rpcgss veth tun ntfs lockd sunrpc grace hfsplus hfs cdrom cifs cifs_md4 cifs_arc4 autofs4 dns_resolver nls_utf8 nls_koi8_r nls_cp1255 nls_iso8859_6 nls_iso8859_2 nls_iso8859_15 nls_iso8859_13 nls_cp932 nls_cp866 nls_cp864 nls_cp862 nls_cp852 nls_cp850 nls_cp775 nls_cp1251 nls_cp1250 xts crypto_user algif_skcipher algif_rng algif_hash algif_aead af_alg sha512_generic seqiv jitterentropy_rng drbg pcbc md5 md4 hmac ghash_arm_ce ecb cts cmac arc4 vfat fat nls_iso8859_1 nls_cp437 uas leds_turris_omnia ledtrig_transient ahci fsl_mph_dr_of ehci_fsl xfs exfat dm_mirror dm_region_hash dm_log dm_crypt dm_mod dax encrypted_keys trusted tpm oid_registry asn1_encoder asn1_decoder
[65596.965410] CPU: 1 PID: 3080 Comm: netifd Not tainted 5.15.96 #0
[65596.971445] Hardware name: Marvell Armada 380/385 (Device Tree)
[65596.977386] [<c010e804>] (unwind_backtrace) from [<c010a46c>] (show_stack+0x10/0x14)
[65596.985159] [<c010a46c>] (show_stack) from [<c05f0f38>] (dump_stack_lvl+0x40/0x4c)
[65596.992757] [<c05f0f38>] (dump_stack_lvl) from [<c012d45c>] (__warn+0x9c/0xf8)
[65597.000011] [<c012d45c>] (__warn) from [<c012d530>] (warn_slowpath_fmt+0x78/0xa8)
[65597.007518] [<c012d530>] (warn_slowpath_fmt) from [<c01147a0>] (__arm_dma_free.constprop.0+0x144/0x1c4)
[65597.016941] [<c01147a0>] (__arm_dma_free.constprop.0) from [<c07893a4>] (mvneta_cleanup_txqs+0x78/0x15c)
[65597.026467] [<c07893a4>] (mvneta_cleanup_txqs) from [<c078f5ec>] (mvneta_setup_txqs+0x270/0x278)
[65597.035293] [<c078f5ec>] (mvneta_setup_txqs) from [<c078f664>] (mvneta_open+0x70/0x2ec)
[65597.043327] [<c078f664>] (mvneta_open) from [<c0862564>] (__dev_open+0xd4/0x1b8)
[65597.050750] [<c0862564>] (__dev_open) from [<c0862a18>] (__dev_change_flags+0x194/0x21c)
[65597.058870] [<c0862a18>] (__dev_change_flags) from [<c0862ab4>] (dev_change_flags+0x14/0x44)
[65597.067342] [<c0862ab4>] (dev_change_flags) from [<c088c71c>] (dev_ioctl+0x120/0x65c)
[65597.075202] [<c088c71c>] (dev_ioctl) from [<c0834b4c>] (sock_ioctl+0x33c/0x588)
[65597.082542] [<c0834b4c>] (sock_ioctl) from [<c02be07c>] (sys_ioctl+0x564/0xc74)
[65597.089886] [<c02be07c>] (sys_ioctl) from [<c0100060>] (ret_fast_syscall+0x0/0x48)
[65597.097491] Exception stack(0xc5161fa8 to 0xc5161ff0)
[65597.102560] 1fa0:                   beeb5ccc 00000000 00000008 00008914 beeb5ccc beeb5cc0
[65597.110758] 1fc0: beeb5ccc 00000000 beeb5ccc 00000036 00000008 b6ef1018 0000201d 00000001
[65597.118962] 1fe0: 004dbcd4 beeb5b90 004be6b0 b6f271b4
[65597.124061] ---[ end trace 88602c448c33911e ]---
[65597.128717] ------------[ cut here ]------------
[65597.133352] WARNING: CPU: 1 PID: 3080 at arch/arm/mm/dma-mapping.c:818 __arm_dma_free.constprop.0+0x144/0x1c4
[65597.143310] Freeing invalid buffer c7d79791
[65597.147508] Modules linked in: ath9k ath9k_common xt_connlimit qcserial pppoe ppp_async nf_conncount iptable_nat ath9k_hw ath10k_pci ath10k_core ath xt_state xt_nat xt_helper xt_conntrack xt_connmark xt_connbytes xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD xt_CT usb_wwan pppox ppp_generic nft_redir nft_ct nf_nat_ftp nf_nat nf_flow_table nf_conntrack_netlink nf_conntrack_ftp nf_conntrack mt7915e mt76_connac_lib mt76 mac80211 lz4 ipt_REJECT ebtable_nat ebtable_filter ebtable_broute cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_recent xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_NFQUEUE xt_NFLOG xt_LOG xt_HL xt_DSCP xt_CLASSIFY usbserial ums_usbat ums_sddr55 ums_sddr09 ums_karma ums_jumpshot ums_isd200 ums_freecom ums_datafab ums_cypress ums_alauda slhc sch_cake nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_quota nft_objref nft_numgen nft_log nft_limit nft_hash nft_counter nft_compat nfnetlink_queue nfnetlink_log nf_tables
[65597.147664]  nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 macvlan lz4_decompress lz4_compress iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables ebtables ebt_vlan ebt_stp ebt_redirect ebt_pkttype ebt_mark_m ebt_mark ebt_limit ebt_among ebt_802_3 crc_ccitt compat br_netfilter at24 fuse sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact configs i2c_mux_pca954x i2c_mux ledtrig_oneshot ledtrig_gpio cryptodev xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ipmac ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 nfsv4 nfsd nfs msdos ifb sit ip6_tunnel netlink_diag tunnel6 tunnel4
[65597.235438]  ip_tunnel rpcsec_gss_krb5 auth_rpcgss veth tun ntfs lockd sunrpc grace hfsplus hfs cdrom cifs cifs_md4 cifs_arc4 autofs4 dns_resolver nls_utf8 nls_koi8_r nls_cp1255 nls_iso8859_6 nls_iso8859_2 nls_iso8859_15 nls_iso8859_13 nls_cp932 nls_cp866 nls_cp864 nls_cp862 nls_cp852 nls_cp850 nls_cp775 nls_cp1251 nls_cp1250 xts crypto_user algif_skcipher algif_rng algif_hash algif_aead af_alg sha512_generic seqiv jitterentropy_rng drbg pcbc md5 md4 hmac ghash_arm_ce ecb cts cmac arc4 vfat fat nls_iso8859_1 nls_cp437 uas leds_turris_omnia ledtrig_transient ahci fsl_mph_dr_of ehci_fsl xfs exfat dm_mirror dm_region_hash dm_log dm_crypt dm_mod dax encrypted_keys trusted tpm oid_registry asn1_encoder asn1_decoder
[65597.385892] CPU: 1 PID: 3080 Comm: netifd Tainted: G        W         5.15.96 #0
[65597.393315] Hardware name: Marvell Armada 380/385 (Device Tree)
[65597.399252] [<c010e804>] (unwind_backtrace) from [<c010a46c>] (show_stack+0x10/0x14)
[65597.407025] [<c010a46c>] (show_stack) from [<c05f0f38>] (dump_stack_lvl+0x40/0x4c)
[65597.414621] [<c05f0f38>] (dump_stack_lvl) from [<c012d45c>] (__warn+0x9c/0xf8)
[65597.421874] [<c012d45c>] (__warn) from [<c012d530>] (warn_slowpath_fmt+0x78/0xa8)
[65597.429381] [<c012d530>] (warn_slowpath_fmt) from [<c01147a0>] (__arm_dma_free.constprop.0+0x144/0x1c4)
[65597.438802] [<c01147a0>] (__arm_dma_free.constprop.0) from [<c07893a4>] (mvneta_cleanup_txqs+0x78/0x15c)
[65597.448317] [<c07893a4>] (mvneta_cleanup_txqs) from [<c078f5ec>] (mvneta_setup_txqs+0x270/0x278)
[65597.457130] [<c078f5ec>] (mvneta_setup_txqs) from [<c078f664>] (mvneta_open+0x70/0x2ec)
[65597.465155] [<c078f664>] (mvneta_open) from [<c0862564>] (__dev_open+0xd4/0x1b8)
[65597.472577] [<c0862564>] (__dev_open) from [<c0862a18>] (__dev_change_flags+0x194/0x21c)
[65597.480692] [<c0862a18>] (__dev_change_flags) from [<c0862ab4>] (dev_change_flags+0x14/0x44)
[65597.489155] [<c0862ab4>] (dev_change_flags) from [<c088c71c>] (dev_ioctl+0x120/0x65c)
[65597.497020] [<c088c71c>] (dev_ioctl) from [<c0834b4c>] (sock_ioctl+0x33c/0x588)
[65597.504381] [<c0834b4c>] (sock_ioctl) from [<c02be07c>] (sys_ioctl+0x564/0xc74)
[65597.511725] [<c02be07c>] (sys_ioctl) from [<c0100060>] (ret_fast_syscall+0x0/0x48)
[65597.519318] Exception stack(0xc5161fa8 to 0xc5161ff0)
[65597.524382] 1fa0:                   beeb5ccc 00000000 00000008 00008914 beeb5ccc beeb5cc0
[65597.532581] 1fc0: beeb5ccc 00000000 beeb5ccc 00000036 00000008 b6ef1018 0000201d 00000001
[65597.540777] 1fe0: 004dbcd4 beeb5b90 004be6b0 b6f271b4
[65597.545867] ---[ end trace 88602c448c33911f ]---

Any kernel guru here? I suspected wifi driver (EDIT: or acctually mvneta so SFP maybe) so I switched to basic ones of WiFi and will see how it goes.

@rmk Do you mind taking a look. Maybe there was some buffer overflow (underflow) In the code written here: [HBL] TurrisOS 6.0-alpha2 Halny HL-GSFP SFP GPON Stick problems

Kernel exception is prefixed by:

before in dmesg
[65562.563741] mvneta f1034000.ethernet eth2: Link is Down
[65562.574825] sfp sfp: module transmit fault indicated
[65563.901033] sfp sfp: module transmit fault recovered
[65563.906071] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65565.826141] mvneta f1034000.ethernet eth2: Link is Down
[65565.844573] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65565.888075] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65565.898356] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65566.975551] mvneta f1034000.ethernet eth2: Link is Down
[65567.006071] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65567.050087] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65567.059863] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65568.175397] mvneta f1034000.ethernet eth2: Link is Down
[65568.203978] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65568.249943] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65568.271468] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65569.316967] mvneta f1034000.ethernet eth2: Link is Down
[65569.344401] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65569.391793] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65569.402413] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65570.611083] mvneta f1034000.ethernet eth2: Link is Down
[65570.664375] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65570.708118] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65570.718825] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65571.765190] mvneta f1034000.ethernet eth2: Link is Down
[65571.798280] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65571.844701] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65571.853631] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65572.885231] mvneta f1034000.ethernet eth2: Link is Down
[65572.916059] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65572.959104] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65572.972471] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65574.085054] mvneta f1034000.ethernet eth2: Link is Down
[65574.113590] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65574.157126] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65574.180280] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65575.244995] mvneta f1034000.ethernet eth2: Link is Down
[65575.262749] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65575.306123] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65575.318127] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65576.386129] mvneta f1034000.ethernet eth2: Link is Down
[65576.404710] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65576.448596] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65576.466374] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65577.504983] mvneta f1034000.ethernet eth2: Link is Down
[65577.518225] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65577.562131] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65577.580751] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65578.673795] mvneta f1034000.ethernet eth2: Link is Down
[65578.698130] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65578.742133] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65578.760587] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65579.814734] mvneta f1034000.ethernet eth2: Link is Down
[65579.833663] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65579.878862] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65579.907934] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65581.034394] mvneta f1034000.ethernet eth2: Link is Down
[65581.057063] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65581.100221] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65581.108311] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65582.194777] mvneta f1034000.ethernet eth2: Link is Down
[65582.218014] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65582.261147] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65582.272565] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65583.334444] mvneta f1034000.ethernet eth2: Link is Down
[65583.351512] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65583.395196] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65583.409879] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65584.504344] mvneta f1034000.ethernet eth2: Link is Down
[65584.522677] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65584.566158] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65584.599572] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65585.654408] mvneta f1034000.ethernet eth2: Link is Down
[65585.669509] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65585.713163] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65585.733836] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65586.774176] mvneta f1034000.ethernet eth2: Link is Down
[65586.820191] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65586.864170] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65586.888694] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65587.024335] mvneta f1034000.ethernet eth2: Link is Down
[65587.046489] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65587.090929] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65587.099060] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65588.193956] mvneta f1034000.ethernet eth2: Link is Down
[65588.219131] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65588.263179] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65588.271729] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65589.345547] mvneta f1034000.ethernet eth2: Link is Down
[65589.379953] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65589.423182] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65589.433099] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65590.503766] mvneta f1034000.ethernet eth2: Link is Down
[65590.568019] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65590.611180] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65590.629667] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65591.663691] mvneta f1034000.ethernet eth2: Link is Down
[65591.682233] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65591.726190] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65591.739004] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65592.824067] mvneta f1034000.ethernet eth2: Link is Down
[65592.846717] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65592.890197] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65592.903017] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65593.963523] mvneta f1034000.ethernet eth2: Link is Down
[65593.987744] mvneta f1034000.ethernet eth2: configuring for inband/1000base-x link mode
[65594.031203] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[65594.039548] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[65595.174035] mvneta f1034000.ethernet eth2: Link is Down
[65595.203246] netifd: page allocation failure: order:6, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0
[65595.213964] CPU: 1 PID: 3080 Comm: netifd Not tainted 5.15.96 #0

After reboot everything is back operational and the crash happens every 2 days or maximally every 2 weeks.

Now when I think of it. Comparing to my second Omnia running as dumb AP (same branch HBL) this doesnt happen. And the only thing that is different we might say is SFP port in use. I really suspect SFP.

There’s nothing about OOM kills on those logs. Where did you see any evidence of the oomkiller acting?

Those dmesg logs show kernel crashes, not OOMs.

Thanks @cassianoleal Indeed I guessed its OOM so I changed the subject of that problem accordingly. So crashes not OOM as I thought first.

Yes the functions crashing and showing in the backtrace are in the mvneta.c code. So it’s definitely where you need to look

To bad I am not a programmer. Especially C code. @rmk I politely ask if you could take a look.

I don’t think this would be created by the addition of the support for the HALNy SFP module, because nothing was changed in relation to memory allocation - it’s all already-allocated data structures or static (const) data.

What it looks like to me is a memory allocation problem “netifd: page allocation failure” kind of gives it away - and it’s an order 6 allocation, meaning its a request for 256KiB of contiguous memory. Probably due to memory fragmentation.

I think a question to ask is why the network interface is bouncing up and down - and looking at the backtraces, it looks like something in userspace (netifd?) is responsible for bringing the interface up and down. Certainly, this memory allocation failure backtrace points to netifd requesting that the network interface is brought up.

It does look like mvneta_cleanup_txqs() is failing after mvneta_setup_txqs() fails, which won’t be helping - indeed, looking at the code in mainline, we will end up freeing memory that was previously freed. Here’s a patch for that:

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 0e39d199ff06..2cad76d0a50e 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3549,6 +3549,8 @@ static void mvneta_txq_sw_deinit(struct mvneta_port *pp,

        netdev_tx_reset_queue(nq);

+       txq->buf               = NULL;
+       txq->tso_hdrs          = NULL;
        txq->descs             = NULL;
        txq->last_desc         = 0;
        txq->next_desc_to_proc = 0;

At least that should reduce the number of kernel warnings, possibly down to just the memory allocation failure. If that is down to just memory fragmentation, then I think the only solution is to check that one has CMA enabled and has enough CMA to satisfy mvneta in the presence of fragmentation.

1 Like

Thanky you rmk for the insight. What I noticed is when the crash happens the LED of WAN interface is not blinking. Need to wait it happens again to verify if the interface is flapping between up/down.

Edit:
AFAICT the crash was not there when I switched to HBL so it must be a regression in recent kernels. Could someone from Turris Team apply the patch in HBL? I will test it for some time and report back.

Hey @mvasilek or @miska, you should look at the provided patch by the Linux Kernel developer and apply it to the HBL.

1 Like
root@router:~# cat /proc/meminfo | grep -i cma
root@router:~# dmesg | grep -i cma
[    0.000000] Memory: 2062464K/2097152K available (10240K k
ernel code, 672K rwdata, 2184K rodata, 1024K init, 259K bss,
 34688K reserved, 0K cma-reserved, 1310720K highmem)

So it seems like CMA is disabled.

Edit: But enabling it in kernel should be fairly simple as adding

CONFIG_CMA=y

Here:

With the apropriete comment. But one change at the time and testing.

@mvasilek / @miska Could you apply the patch? Crashes happen every two days and it’s anoyying and posibble fix is ready. I would suggest to first apply the patch. I will report if mvneta is missing CMA and if so then lets enable it in kernel.

But this could affect MOX and Turris1.x as well. Thats why there is HBL to test stuff. Thanks

Edit: I reported issue on gitlab. Maybe like that devs will see it:

I’ve sent a RFC patch series to netdev that removes the order-6 allocation entirely - it’s not a simple fix so its going to be targetted for net-next. However, getting some testing from someone with a reliable reproducer would be great.

https://lore.kernel.org/netdev/ZCsbJ4nG+So%2Fn9qY@shell.armlinux.org.uk/T/#t

2 Likes

Thanks for explaining the root cause of the problem. I hope the patches are going to be backported to Turris soon.