Only 4-bit (0-15) VLANs allowed with TurrisOS 4.0.1 on Turris Omnia

ErikCarlseen · November 15, 2019, 1:26am

It looks like the OpenWRT configuration management back-end (/bin/check_board and /etc/board.d/01_network, which generate /etc/board.json) is not properly detecting the Marvell 88E6176 switch, which in turn breaks many things VLAN-related whether you’re editing configuration files (/etc/config/network) directly or trying to use the LUCI web interface and you need to use a VLAN with an ID higher than 15. It looks like OpenWRT assumes a default switch configuration that’s limited to 4-bit VLANs unless /etc/board.json tells it otherwise. Additionally, the switch ports in the LUCI web interface (and presumably elsewhere) are misnumbered as well. This looks like it should be very straightforward to correct for someone who knows what they are doing. Sadly, this person is not me.

In the meantime, I’m using software bridging instead of switching. This is sloppy.

UPDATE: It looks like LUCI switch / VLAN configuration is broken due to the removal of the swconfig command.

protree · November 15, 2019, 8:27am

Turris OS 4.0 switched switch and VLAN configuration from swconfig to dsa due to changes in upstream OpenWRT and Linux kernel. VLANs can now be configured in the interface configuration tab in LuCi.

E.g. if you want an interface to be present on Lan 4 port with vid 10 you have to add the Virtual Network Interface Lan4.10 to the interface‘s bridge. This will not create a software bridge, but will set up the switch chip using dsa.

See explanation from cynerd here: Turris OS 4.0 beta1 is released!

He also provided a link with background explanation of the dsa system. DSA was created to configure switch chips using standard linux tools and to eliminate the need of a tool like swconfig that always required users to set up their switch manually AFTER they already set up their interfaces using standard linux tools.

ErikCarlseen · November 15, 2019, 6:45pm

Well, I appreciate that it is supposed to work that way, but I just spent an entire day with it not functioning as documented in 4.0 (worked fine in 3.x). Here you can see the mess in LUCI:

Switch ports are not numbered correctly (not a big deal, other than it being an indication that the software is deeply confused), and it rejects the VLAN IDs. Digging into the LUCI code shows that it appears to first checks board.json for info, and failing that falls back to using swconfig to query the switch. If it can’t find either, it uses a default configuration (ports numbered 1-5, plus a CPU interface, and 4-bit VLANs) as shown. So what you’re seeing is basically LUCI throwing up its virtual hands and saying “OK, whatever, I’ll try to deal with this random situation.” If this happens to work in certain instances it’s still not the same as it functioning as intended.

Like many users with a reasonable background in Linux I consider LUCI to be a convenience and not a necessity, but switch configuration was not functioning as documented (by either Turris or OpenWRT) in /etc/config/network. Digging into the code, this also appears to center around the software’s belief that only 4-bit VLAN IDs are allowed on this switch (not true). So while the following lines will cause the LUCI interface to indicate the appearance of VLANs, they appear to do exactly nothing as far as switch configuration goes:

config switch
option name ‘switch0’
option reset ‘1’
option enable_vlan ‘1’
option enable_vlan4k ‘1’

config switch_vlan
option device ‘switch0’
option vlan ‘1’
option ports ‘4 5’

config switch_vlan
option device ‘switch0’
option vlan ‘150’
option ports ‘4t 5t’

config switch_vlan
option device ‘switch0’
option vlan ‘3080’
option ports ‘0 1 2 4t 5t’

config switch_vlan
option device ‘switch0’
option vlan ‘4000’
option ports ‘3 4t 5t’

Packets don’t flow as expected between ports, and none of the available command line tools that I’m aware of (I’m using “bridge” and “ip link”; there doesn’t appear to be a great replacement for swconfig yet) indicate that VLANs are configured on the ports. Indeed, it won’t even configure the existence of any LANx ports unless they’re explicitly added to a bridging configuration like so:

config interface ‘lan’
option type ‘bridge’
option proto ‘static’
option ipaddr ‘<>’
option netmask ‘<>’
option gateway ‘<>’
option broadcast ‘<>’
option dns ‘<>’
option ifname ‘eth2 lan3 lan4.4000’

This causes packets to flow as expected when used (I think!) in conjunction with the undocumented “enable_vlan4k” configuration option for the switch. The desired behavior is the creation of a software bridge between eth2 (separate WAN port) and vlan4000 on the switch cpu interface (eth0), with lan4 acting as an 802.1q trunk port with VLAN tag 4000 allowed and lan3 acting as a switch access port with the access (untagged) VLAN set to 4000. But from the shell prompt it really looks like everything is happening in software:

bridge -d vlan show dev lan3
port vlan ids
lan3 1 PVID Egress Untagged

I would expect to see:

bridge -d vlan show dev lan3
port vlan ids
lan3 4000 Egress Untagged

I can accomplish this manually by using these commands:

bridge -d vlan del vid 1 dev lan3 untagged
bridge -d vlan add vid 4000 dev lan3 untagged

So perhaps something is going on behind the scenes that is setting up the switch properly, but from the command line it really looks like this is not the case.

protree · November 15, 2019, 8:36pm

This LuCi Tab is unfunctional and unsupported in Turris OS 4.0 because it relies on swconfig which is unsupported and not needed any more as it was replaced by dsa in Turris OS 4.0 as I wrote…

This also configures swconfig…

This is expected and is the correct way to go.

I believe this is the case as far as I understand the design of dsa. Take a look at the link to kernel dsa documentation that was posted by cynerd.

Again, dsa integrates into standard linux network tools like „bridge“

EDIT: Quote from mentioned kernel documentation:

The original philosophy behind this design was to be able to use unmodified
Linux tools such as bridge, iproute2, ifconfig to work transparently whether
they configured/queried a switch port network device or a regular network
device.

anon50890781 · November 15, 2019, 8:55pm

It took me while to grasp the concept of DSA (switch management) after reading its documentation. For me the simplified essence is:

DSA manages smart switches that support one of the five DSA tag protocols ^[1] to the extent of:
- identifies which port the Ethernet frame came from/should be sent to
- provides a reason why this frame was forwarded to the management interface
in that spirit it does not require 802.1Q (VLAN) tagging anymore, whereas previously it was required since the switch been treated/considered as dumb
802.1Q (VLAN) tagging up/downstream (in/egress) is not handled by DSA but instead by the kernel’s 8021q network stack ^[2]
configuration management of 802.1Q is done with userland iproute2 portoflio - ip and/or bridge
bridge provides more advanced capabilities for configuration management of 802.1Q than ip
vlan filtering on a bidge provides vlan isolation and subsequent the number of software bridges can be mitigated
vlan filtering requires:
- kernel compilied with CONFIG_BRIDGE_VLAN_FILTERING=y
- to be enabled /sys/class/net/<bridge>/bridge/vlan_filtering
UCI (and thus LuCI) has no parsing routine for configuring 802.1Q (VLAN) tagging with bridge or configuring vlan filtering

^[1] https://www.kernel.org/doc/html/latest/networking/dsa/dsa.html#switch-tagging-protocols
^[2] https://github.com/torvalds/linux/tree/master/net/8021q

anon50890781 · November 15, 2019, 9:21pm

That is just default. Unless otherwise configured it will show

1 PVID Egress Untagged

for any/all ifaces, just run bridge vlan

The switch is managed by DSA with its five supported tag protocols. For 802.1Q (VLAN) tagging it requires ip or bridge as you already figured out.

anon50890781 · November 15, 2019, 10:35pm

Limited to what ip l provides

That does not seem correct. DSA not triggered/invoked by the presence of a VLAN tag. According to ^[1]

probe routine which will be invoked by the DSA platform device upon registration to test for the presence/absence of a switch device. For MDIO devices, it is recommended to issue a read towards internal registers using the switch pseudo-PHY and return whether this is a supported device. For other buses, return a non-NULL string

Perhaps depends on the definition of what standard linux tools are, just according to ^[1]

setup function for the switch, this function is responsible for setting up the dsa_switch_ops private structure with all it needs: register maps, interrupts, mutexes, locks etc…

I have no found a standard linux userland tool that calls the setup function and provides parsing for the stated parameters.

^[1] Architecture — The Linux Kernel documentation

ErikCarlseen · November 16, 2019, 12:27am

OK, we’re getting way off-topic here. The point is, that TurrisOS 4 does not configure VLANs on the switch, period, no matter what documented setup methodology you use. It will configured VLANs on an interface, which is not necessarily the same thing (hardware forwarding vs software forwarding). Unless you write a script from scratch you will be shuffling packets with the CPU that should be offloaded to the switching hardware.

I pulled out a MOX and created a simplified example:

config interface ‘test_vlan_101’
option type ‘bridge’
option ifname ‘lan3 lan4.101’
option proto ‘static’
option ipaddr ‘172.16.68.90’
option netmask ‘255.255.255.224’
option gateway ‘172.16.68.65’
option _turris_mode ‘managed’

The test device (172.16.68.91) can ping the Turris MOX (172.16.68.90) just fine, but it cannot ping the firewall (172.16.68.65) or any other device on the test VLAN that’s other side of the MOX. Packet captures using TCP dump show that the Turris MOX is is forwarding broadcast traffic out lan4.101, but is for whatever random reason filtering ARP packets.

The MOX (172.16.68.90) can ping everything on the test VLAN, regardless of whether it’s the test device out lan3 or any of the devices on the other side of the Cisco switch out of lan4.101.

In any case, if the packets were being forwarded in hardware this should not be an issue. It appears that packets are being forwarded (and improperly at that) in software. That DSA requires the use of the iproute2 tools to manage things seems to obfuscate the process a bit, but the bridge and brctl tools seem to indicate (and the above experiment confirms) that packet forwarding is in software.

anon50890781 · November 16, 2019, 8:45am

Seems this been introduced initially by ^[1].

This can be queried with bridge fdb and should show the “offload” lablel.

According to ^{[2] [3]} DSA leverages switchdev framework to offload features to the device (switchdev obj for VLAN add/del ops). Looking into ^[4] it is stated

To offloading L2 bridging, the switchdev driver/device should support:

Static FDB entries installed on a bridge port

Notification of learned/forgotten src mac/vlans from device

STP state changes on the port

VLAN flooding of multicast/broadcast and unknown unicast packets

Not sure whether all those requirements are met by the device/driver.

Do you have vlan filtering enabled ^[4]?

Note: by default, the bridge does not filter on VLAN and only bridges untagged traffic. To enable VLAN support, turn on VLAN filtering:

echo 1 >/sys/class/net//bridge/vlan_filtering

With the above it seems that

is not correct since apparently there being a difference between hardware 802.1Q tagging (via DSA / / switchdev framework / device driver) and software 802.1Q tagging.

^[1] net: dsa: mv88e6xxx: add support for VLAN Table Unit [LWN.net]
^[2] Architecture — The Linux Kernel documentation
^[3] Architecture — The Linux Kernel documentation
^[4] https://www.kernel.org/doc/Documentation/networking/switchdev.txt

protree · November 16, 2019, 11:27am

As far as understand dsa „dsa ports“ (the „switch ports“) are represented as normal PHYs that can be configured using standard network tools like bridge or ip. Changes to a PHY then triggers switchdev functions to set up the switch chip according to how the PHY was configured. Which means that no userland linux network tool needs to call switchdev directly because of this abstraction.

It was stated multiple times on this forum by turris team members like cynerd that there is no loss of functionality with the switch to dsa. It was also stated that offloading to switch chip is supported and happening. I don‘t see why this should be wrong. I will do some more testing for myself once I can spend some time.

Nevertheless I‘d love to see some official documentation for this as of now I only can find kernel documentation that isn‘t specific for OpwnWRT/Turris or some documentation on OpenWRT wiki that is way too superficial and mixed up with old swconfig setup…

anon50890781 · November 16, 2019, 11:54am

That works only the extent of what userland does provide, logically. With those userland tools it does not seem possible however to change the default Vlan ID of a specific port since it is programmed in an eprom register. ^[1]

As a result the sytem log prints on this node (TOS4.x | 5.x but not on 6.x):

mv88e6085 f1072004.mdio-mii:10 lan0: configuring for phy/gmii link mode
mv88e6085 f1072004.mdio-mii:10: p0: hw VLAN 1 already used by br-guest
mv88e6085 f1072004.mdio-mii:10 lan1: configuring for phy/gmii link mode
mv88e6085 f1072004.mdio-mii:10: p1: hw VLAN 1 already used by br-guest
mv88e6085 f1072004.mdio-mii:10 lan2: configuring for phy/gmii link mode
mv88e6085 f1072004.mdio-mii:10: p2: hw VLAN 1 already used by br-guest
mv88e6085 f1072004.mdio-mii:10 lan3: configuring for phy/gmii link mode
mv88e6085 f1072004.mdio-mii:10: p3: hw VLAN 1 already used by br-guest

Indeed in general there is no such loss, except that UCI has no parsing routine for bridge a dev <netdev> vid <id> <options> and thus needs a manual workaround.

Same for enabling vlan filtering on a bridge.

Whilst it was mentioned there is no proof provided that it actually does. Running bridge fdb on this node the offload label is not present on any port which though would be excpected after reading ^[2]

On SWITCHDEV_FDB_ADD, the bridge driver will install the FDB entry into the
bridge’s FDB and mark the entry as NTF_EXT_LEARNED. The iproute2 bridge
command will label these entries “offload”:

Looking forward to your findings.

Suppose the only specifics for the repos would be what and how is being supported by UCI (and subsequent in the respective UIs (LuCI / Foris).

^[1] drivers/net/dsa/mv88e6131.c - kernel/msm - Git at Google
^[2] https://www.kernel.org/doc/Documentation/networking/switchdev.txt

protree · November 16, 2019, 12:06pm

I see what you mean… It‘s not even present for default br-lan bridge on lanX… As I said, I will have a look. Thanks

EDIT: Which may be because I only have one Device connected to a LAN port atm, can‘t check what happens if I connect another one atm…

anon50890781 · November 16, 2019, 12:17pm

Querying the two management ports (eth0|1) and the front panel ports (lan0|1|2|3|4) with:

ethtool -k <dev>| grep offload

tcp-segmentation-offload: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]

ethtool -k <dev>| grep vlan

rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]

Would indicate there is not much offloading possible with the device/driver and also Vlan features are turned off.

I found this ^[1] somewhat of curios jumble mumble

For hardware offloading, when a flow is constructed, the driver implements a callback to send it to hardware. If the hardware supports it, it is added there, otherwise it is added to the software flow table. Hardware offloading is not upstream yet because there is no driver using it yet. In OpenWRT, the flow offloading is extended to support bridges, VLANs and PPPoE.

How is that supposed to be working if the query into the hardware feature shows no support for it?

^[1] https://www.mind.be/openwrtsummit18/2018-10-29-OpenWRT18-network-offloading.html

ErikCarlseen · November 17, 2019, 11:07pm

Ugh - digressing to whine about the continued “Microsoft-ization” of Linux: the replacement of small, discrete tools with monolithic “all-in-one, trust-the-system-it-knows-what-its-doing” unmaintainable bug-ridden monstrosities that offer little insight and less control (See also: systemd).

The above comment is not directed at Turris or OpenWRT (except maybe the udev part), but at the Linux ecosystem in general lately.

Back to the problem at hand: I don’t see how the kernel or any of the configuration toolchain involved would be setting up the switch correctly if the switch is not properly identified or acknowledged by said tools. The fact that the system’s assumed default switch is similar to what’s in the Omnia (but not the MOX with module E) offers some glimmer of hope, but all signs point to this not being the case. Which goes back to my original supposition that if any of this functions even semi-correctly on top of OpenWRT 18.x it’s purely by blind luck.

anon50890781 · November 18, 2019, 9:39am

Reading from the syslog (mind this on TOS6.x with kernel 4.19.82) it would appear that the kernel probes for and identifies, through the PHY/MDIO framework,

kernel: libphy: Fixed MDIO Bus: probed
kernel: libphy: orion_mdio_bus: probed
kernel: libphy: mv88e6xxx SMI: probed
kernel: mv88e6085 f1072004.mdio-mii:10: switch 0x1760 detected: Marvell 88E6176, revision 1
kernel: mvneta f1070000.ethernet eth0: Using hardware mac address d8:58:d7:00:79:7c
kernel: mvneta f1030000.ethernet eth1: Using hardware mac address d8:58:d7:00:79:7a
kernel: mvneta f1034000.ethernet eth2: Using hardware mac address d8:58:d7:00:79:7b
kernel: mv88e6085 f1072004.mdio-mii:10 lan0 (uninitialized): PHY [mv88e6xxx-1:00] driver [Marvell 88E1540]
kernel: mv88e6085 f1072004.mdio-mii:10 lan1 (uninitialized): PHY [mv88e6xxx-1:01] driver [Marvell 88E1540]
kernel: mv88e6085 f1072004.mdio-mii:10 lan2 (uninitialized): PHY [mv88e6xxx-1:02] driver [Marvell 88E1540]
kernel: mv88e6085 f1072004.mdio-mii:10 lan3 (uninitialized): PHY [mv88e6xxx-1:03] driver [Marvell 88E1540]
kernel: mv88e6085 f1072004.mdio-mii:10 lan4 (uninitialized): PHY [mv88e6xxx-1:04] driver [Marvell 88E1540]
kernel: DSA: tree 0 setup

the installed switch and thus does not seem to be blind luck or being assumed for the board platform.
And since DSA leverages the MDIO/PHY library this should be fine then.

According to the output from ethtool there are various features not being supported by the switch device or its driver and thus presumably (logically) handled in the software plan instead by the CPU.

The 88E1540 chip would appear to be embedded with with the 88E6176 chip and handles the physical layer functions of its Gigabit Ethernet transceivers (LAN front panel ports) and thus being irrelevant for DSA.

As for the 88E6176-A1-TFJ2 switch chip (in this node) I could not find a product brief or data sheet about its hardware offload capabilities or hardware handling (insert | parse | filter) of VLAN tags.

anon50890781 · November 20, 2019, 3:22pm

In view of

Opened

but that been closed as invalid due to:

being not constructive
not providing anything of value
being unusable dump

If anyone has better input and feels to chime at the Gitlab please do not hesitate and perhaps the developer team could be convinced otherwise.