Hi,
I am observing frequent random reboots of my new MOX. It is a AE configuration. To track this down I enabled syslog forwarding and today I got 15 reboots, yesterday 85 reboots. Usually there is nothing in the syslog that seems to be a good explaination for that, but once I got a kernel stack trace:
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.503432] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.509563] rcu: #0111-...!: (0 ticks this GP) idle=297/0/0x3 softirq=23235/23235 fqs=1
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.517648] #011(detected by 0, t=24007 jiffies, g=43241, q=19175)
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.523754] Task dump for CPU 1:
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.527079] task:swapper/1 state:R running task stack: 0 pid: 0 ppid: 1 flags:0x00000008
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.537315] Call trace:
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.539832] __switch_to+0xe0/0x124
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.543438] 0xffffffc008c62930
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.546677] rcu: rcu_sched kthread timer wakeup didn't happen for 17989 jiffies! g43241 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.558340] rcu: #011Possible timer handling issue on cpu=1 timer-softirq=27775
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.565607] rcu: rcu_sched kthread starved for 17990 jiffies! g43241 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.576282] rcu: #011Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.585520] rcu: RCU grace-period kthread stack dump:
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.590725] task:rcu_sched state:I stack: 0 pid: 13 ppid: 2 flags:0x00000008
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.599342] Call trace:
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.601859] __switch_to+0xe0/0x124
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.605456] __schedule+0x258/0x694
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.609055] schedule+0x58/0xbc
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.612292] schedule_timeout+0x7c/0xec
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.616250] rcu_gp_fqs_loop+0xe8/0x390
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.620208] rcu_gp_kthread+0xf4/0x12c
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.624074] kthread+0x11c/0x130
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.627402] ret_from_fork+0x10/0x20
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.631087] rcu: Stack dump where RCU GP kthread last ran:
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.636740] Task dump for CPU 1:
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.640064] task:swapper/1 state:R running task stack: 0 pid: 0 ppid: 1 flags:0x00000008
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.650294] Call trace:
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.652811] __switch_to+0xe0/0x124
2025-04-13T20:29:02+02:00 turris kernel: [ 9044.656408] 0xffffffc008c62930
Any ideas what I could do to track this down?