LAN to WAN routing broken (how to diagnose and fix?)

I churned ISPs today and after succeeding all was working fine for an afternoon. About to celebrate. I rebooted my Omnia and since then the LAN to WAN routing is broken. I cannot work out why. So I rolled back to yesterday’s snapshot, then reconfigured the wan interface to my new ISP (the PPPoE Username and Password and the eth2.100 interface to specify the ISPs VLAN) and the wan interface came up fine.

So WAN is up. But nothing on the LAN has internet connectivity. And the routing of LAN to WAN is broken. I have looked but am blind to why, or even what could be causing this and so before I retire am hoping an experienced eye can cast a glance at the problem.

Here it is in a nutshell, using 8.8.8.8 as a target IP (just a memorbale Google nameserver, this exercise holds true for all WAN IPs, LAN IPs are fine):

$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.1     0.0.0.0         UG    20600  0        0 wlp11s0
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 wlp11s0
192.168.0.0     0.0.0.0         255.255.252.0   U     600    0        0 wlp11s0
$ traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
 1  _gateway (192.168.0.1)  0.878 ms  0.831 ms  1.534 ms
 2  _gateway (192.168.0.1)  2.196 ms  4.436 ms  6.640 ms
$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
From 192.168.0.1 icmp_seq=1 Destination Port Unreachable
From 192.168.0.1 icmp_seq=2 Destination Port Unreachable
From 192.168.0.1 icmp_seq=3 Destination Port Unreachable
From 192.168.0.1 icmp_seq=4 Destination Port Unreachable

and similar is achieved from any device I have on the LAN, but if I ssh to the Omnia and try from there:

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         203.134.50.152  0.0.0.0         UG    0      0        0 pppoe-wan
10.8.0.0        10.8.0.2        255.255.255.0   UG    0      0        0 tun0
10.8.0.2        0.0.0.0         255.255.255.255 UH    0      0        0 tun0
192.168.0.0     0.0.0.0         255.255.252.0   U     0      0        0 br-lan
203.134.50.152  0.0.0.0         255.255.255.255 UH    0      0        0 pppoe-wan
# traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 38 byte packets
 1  lo10.lns22.melbvoc.vic.vocus.network (203.134.50.152)  12.882 ms  10.537 ms  11.586 ms
 2  ae12-211.edg01.pmelnxd.vic.vocus.network (203.134.52.228)  13.424 ms  11.583 ms  ae12-111.edg01.melbvoc.vic.vocus.network (203.134.31.228)  11.071 ms
 3  142.250.164.244 (142.250.164.244)  13.803 ms  ae0.edg01.pmelnxd.vic.vocus.network (203.134.25.233)  11.622 ms  142.250.164.244 (142.250.164.244)  11.741 ms
 4  142.250.164.244 (142.250.164.244)  11.430 ms  *  12.035 ms
 5  dns.google (8.8.8.8)  11.554 ms  *  11.457 ms

and the network config on the Omnia:

# cat /etc/config/network 

config interface 'loopback'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'
	option ifname 'lo'

config globals 'globals'
	option ula_prefix 'fd55:c3f2:02a6::/48'

config interface 'lan'
	option force_link '1'
	option type 'bridge'
	option proto 'static'
	option ip6assign '60'
	option _turris_mode 'managed'
	list ifname 'lan0'
	list ifname 'lan1'
	list ifname 'lan2'
	list ifname 'lan3'
	list ifname 'lan4'
	list ipaddr '192.168.0.1/22'

config interface 'wan'
	option proto 'pppoe'
	option ipv6 '1'
	option username 'username'
	option ifname 'eth2.100'
	option password 'password'

config interface 'wan6'
	option _orig_ifname '@wan'
	option _orig_bridge 'false'
	option proto 'pppoe'
	option username 'oldISPusername'
	option password 'oldISPpassword'
	option ipv6 'auto'
	option ifname '@wan'

config route

config interface 'vpn0'
	option proto 'none'
	option auto '1'
	option ifname 'tun0'

config interface 'vpn_turris'
	option enabled '0'

and a few screenshots:

Alas I’m not good at reading the kernel routing tables and will need to do some reading up on that on the morrow. I am this evening bamboozled how a restore of yesterdays snapshot which was a fully functional Omnia on my previous ISP (and has been for years) can turn into one that has no LAN to WAN routing. Moreover how this can come to be by simply rebooting the router, given it was fully functional with LAN to WAN routing for an afternoon after churning today. These mysteries perplex me.

It affects every device on the LAN I try so isn’t an end device issue. The Omnia can see the WAN and I can wget any old site fine. And on the LAN I reach the Omnia and any other box. I host some websites internally and NAT routes the requests and it seems incoming requests are not being routed to the servers either (watching the log files).

Suggesting a LAN/WAN disconnect in routing.

What could cause this? How can it be diagnosed, and/or fixed?

And a few more diagnostic efforts with no conclusions on my part yet:

# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 532
    link/ether d8:58:d7:00:62:e7 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 532
    link/ether d8:58:d7:00:62:e5 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 532
    link/ether d8:58:d7:00:62:e6 brd ff:ff:ff:ff:ff:ff
5: lan0@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether d8:58:d7:00:62:e5 brd ff:ff:ff:ff:ff:ff
6: lan1@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether d8:58:d7:00:62:e5 brd ff:ff:ff:ff:ff:ff
7: lan2@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether d8:58:d7:00:62:e5 brd ff:ff:ff:ff:ff:ff
8: lan3@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether d8:58:d7:00:62:e5 brd ff:ff:ff:ff:ff:ff
9: lan4@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether d8:58:d7:00:62:e7 brd ff:ff:ff:ff:ff:ff
10: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/tunnel6 :: brd ::
11: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
12: ifb0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 32
    link/ether 76:a1:f5:a0:65:85 brd ff:ff:ff:ff:ff:ff
13: ifb1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 32
    link/ether fe:c8:53:b3:f9:7e brd ff:ff:ff:ff:ff:ff
14: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/gre 0.0.0.0 brd 0.0.0.0
15: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
16: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
17: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 04:f0:21:1c:bc:07 brd ff:ff:ff:ff:ff:ff
19: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d8:58:d7:00:62:e5 brd ff:ff:ff:ff:ff:ff
25: wlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether 04:f0:21:23:15:70 brd ff:ff:ff:ff:ff:ff
38: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 100
    link/none 
45: eth2.100@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d8:58:d7:00:62:e6 brd ff:ff:ff:ff:ff:ff
46: pppoe-wan: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 3
    link/ppp 
root@Cerberus:~# uci show network
network.loopback=interface
network.loopback.proto='static'
network.loopback.ipaddr='127.0.0.1'
network.loopback.netmask='255.0.0.0'
network.loopback.ifname='lo'
network.globals=globals
network.globals.ula_prefix='fd55:c3f2:02a6::/48'
network.lan=interface
network.lan.force_link='1'
network.lan.type='bridge'
network.lan.proto='static'
network.lan.ip6assign='60'
network.lan._turris_mode='managed'
network.lan.ifname='lan0' 'lan1' 'lan2' 'lan3' 'lan4'
network.lan.ipaddr='192.168.0.1/22'
network.wan=interface
network.wan.proto='pppoe'
network.wan.ipv6='1'
network.wan.username='username'
network.wan.ifname='eth2.100'
network.wan.password='password'
network.wan6=interface
network.wan6._orig_ifname='@wan'
network.wan6._orig_bridge='false'
network.wan6.proto='pppoe'
network.wan6.username='oldISPusername'
network.wan6.password='oldISPpassword'
network.wan6.ipv6='auto'
network.wan6.ifname='@wan'
network.@route[0]=route
network.vpn0=interface
network.vpn0.proto='none'
network.vpn0.auto='1'
network.vpn0.ifname='tun0'
network.vpn_turris=interface
network.vpn_turris.enabled='0'

I see you use the TOS 5.4.4. This is unsupported version, please upgrade to TOS 6.

Can you delete this line from your /etc/config/network file and restart the network with service network restart?

I’ve tried and it failed and I haven’t got around to trying again and that just adds noise to the current problem. Surely the reasons for such bizarre routing failure have not change remarkably in one OS jump? Believe me I want to upgrade it and will in time, but last effort I tried I lost all connectivity to the router and rolled back a snapshot with the reset button. Wasn’t fun. The first time I tried that did not happen but my lighttd confs were broken, so I scheduled a second try. And when I tried the second time many weeks later the upgrade effort isolated the Omnia. Each effort costs me hours of time and effort and downtime so I schedule them cautiously and have a second Omnia now on hand (worth buying) to try on :wink:

Thanks for the tip on /etc/config/network. Just tried it with great hope, The WAN interface went down and up and takes 30 seconds to configure but on returning nothing has changed from my Original Post above.

On the Luci interface my wan IP is listed under the LAN interface ans IPv4 gateway. I did not take reference shots of the LAN interfaces before churning. I did of the WAN interface pages.

It’s puzzling I shall have to try the other Omnia next and if it works I can start migrating configs to it, or inspect and compare configs. Alas it’s all very time consuming and is slowly draining me.

Regarding upgrade: did you do a version-by-version upgrade (5.4.4⇨6.0⇨6.0.1⇨…) or try to hop from 5.4.4 difficult to 6.2.3? Unfortunately not doing that in OpenWrt will lead every now and then to a broken system, I am afraid. Reason is necessary scripts that help with converting a config might only be available with 6.0 but not 6.0.1 (e.g. the script for preparing /etc/config/network is available in TOS6.0, but may not be available in 6.2.3 if you try to upgrade directly coming from 5.4.4).

Another option to get rid of all potential issues: try and flash a fresh TOS6.2.3 and see if this solves the problem. Then do snapshots and compare to the upgraded solution.
But please save your time and don’t do trial and error with an outdated version :wink:

Note sure what I did any more but thanks for the tip. I do have notes and can consult them. But I have a heavily configured Omnia and migration is never a quick or cheap thought. I did invest in a second to facilitate this.

But it is a digression. As I am here with an excellent router and fine OS, and it’s works beautifully for a decade,and was working for an afternoon on this new ISP then suddenly stops routing from LAN to WAN. it strikes me there must be a diagnosable reason for this.

I notice on the Luci LAN interface the listed gateway is wan but is not my IP, but if I traceroute that address from the Omnia it is one hop away, and I imagine this is the IP for my ISPs gateway, I cannot help but wonder if the LAN does not need an explicit bridge to the WAN or to identify the Omnia itself as the gateway or some such thing.

I do also always see these painful experiences as learning opportunities and yes, as fatigues sets in and wasted time mounts workarounds become more and more attractive. But the questions remain … the internet is full of posts I can find about OpenWRT and LAN to WAN routing, a lot of reading there but none I’ve skimmed are helpful, always specific issues with poorly described and specific solutions to experimental configs.

This is wholly different. Working fine. Rebooted. And LAN no longer routes to WAN. rollback to clean snapshot and connect WAN and still LAN no longer routes to WAN. A deep mystery, which almost points at the ISP. But there is no way I can imagine the ISP impacting the Omnia’s routing packets from LAN to WAN, even if they wanted to.

I also have a feeling the traceroute may be indicative. It lists the gateway (Omnia) as the first and second hop. If I traceroute to any device on the LAN it is listed as only one hop. Something fishy there.

Well, before retiring I have confirmed my spare Omnia connects to WAN and does forward LAN to WAN (as in I have WAN connectivity from the laptop connected to it).

It is Turris OS 4 though, not a move forward in versions, a move backward. And if I compare /etc/network/config I see no difference. So I checked the firewall and that did look different. Closer comparisons will take time as they are in different locations for now. The firewall on the Omnia which won’t forward LAN to WAN mysteriously looks a bit suspiscous to me, but not conclusively:

I’ll have to do closer comparisons later. This causes me a conundrum. I in no way want to risk the one functional Omnia with a string of upgrades or reflash, I want to be online ASAP with minimal risk. I have power cycled the dysfunctional Omnia and it is still not forwarding LAN to WAN.

Frightening somehow how such odd issues can arise defying explanation.

And it is fixed! Before retiring. That will help me sleep.

In looking around the firewall rules, I noticed I have many. Quite some lint collected over the years. Mostly opening selected ports for traffic from selective sources or for selective protocols. Not too bad but a page full on Luci.

At the bottom of them all was a suspicious rule, with no name and empty. I looked at it and it made no sense, just an empty rule. I deleted it. And bang, suddenly all the LAN is connected again. Like it was all afternoon.

This blows my mind? Somehow, somewhen, this rule came into being. It has almost certainly to been before yesterdays snapshot though as I rolled back to that in trying to fix this. But it cannot been surely as if it caused the issue I would have had the issue yesterday. I remain twilight-zoned here, and hear spooky music … it bamboozles me.

It might be time to upgrade your “fleet” :wink: Since you have two, I would bite the bullet and update one to TOS6 and reset it and then one by one add the special sauce required/desired for your network. That should get rid of some technical debt and should give you a better starting point for the next “octcade” :wink: (I did that step between TOS3 and TOS4 and I think I should do that again at latest with TOS7, when the firewall likely switches to nftables…)

Agreed. Given one is clean, and I now understand the WAN configurations on my new ISP, it ref lashing the backup to latest Turris OS may be a plan.

It’d be wonderful if I could find a cheap 6 point physical Gb LAN switch. They are dumping 100Mbps 2 port ones like you would not believe and you can get Gb 2 but ones easy enough too. But alas these routers aren’t situated in an easily accessible comfortable place and working with manual cable switches on them is a pain.

Still, any vision on when Turris OS 7 is expected to arrive? As I am so snowed under a year can pass before such a TODO trickles to the top, not least after a whole day of my time and energy drained on an ISP churn and attendant surprises (i.e.e I won’t be reflashing tomorrow and will be turning to other distractions again). So like you I may find it worth waiting of Turris OS 7.

But yes I bit the bullet one year and got a second specifically to facilitate this sort of upgrade but also with a vision to running a warm or hot spare. That’s a backburner idea as it’s not simple nor an urgent issue given the wonderful and life-saving schnapps (which let me down on the issue posted here for reasons that I still find perplexing)…

I am already on TOS 7.0 because of the HBL branch. And it works in general. But honepots and docker dont work because of the switch from iptables to nftables

Really off-topic but I would be curious to hear about your feedback on TOS7… I am reluctant to go for it on my MOX and hit tons of troubles, but very keen to learn nftables…

Start a new topic with questions about TOS7.0. And I will be happy to provide feedback. I am checking time to time development on gitlab in HBL branch and lately devs focused on fixing bugs in TOS6.x that people encountered.

Not so much stuff in HBL branch. I guess TOS7.0 is still far future.