TOS5 VLAN Nightmare

I have spent a week with try to configure VLAN on TOS5 without any luck. What worked smoothly on TOS3.x, here is pain in the a** :frowning:

What I can do is nothing special. Just get two separated LAN segments, one for main network, second for Guests. But because I have disabled WiFi on Turris and use two Mikrotik APs for my Wifi I have to use TRUNK port to get traffic from them to Turris. And of course i can use DHCP for both networksā€¦

At first I have configured second bridge for Guest, assign here IP, set DHCP and splitted lan4 to lan4.10 and lan4.20 to get VLANs.

On first look, things worked. Till time when my wife started to complain for big lags during browsing on his laptop connected by Wifi. I have started to digg in, and discover that sometimes ping on www.seznam.cz get 3000 ms.

Deep analysis discovered that ARP replies have stopped to work. Laptop flooding net by ARP requests to get IP of the default gateway (and turris IP), but no reply arrived. But sometimes arrived ARP query from Turris with default GW IP and then ARP table has been satisfied and things started to work for a while.

I provided investigations from Wifi to Mikrotiks and finally arrived to Turris. All points that problem is hereā€¦


I have reverted completely net config of my VLANs on Mikrotik and started to check Turris VLANs. Time for some testLab workā€¦

To do this, I have created testing environment on my Turris:

* VLAN10: lan0, lan1.10
* VLAN20: lan2, lan1.20

Configured two bridges, g1, g2, and asign:

* g1: lan0, lan1.10
* g2: lan2, lan 1.20

To connect on TRUNK port, I use laptop with Hyper-V bridge and modify management interface VLAN tag to 10 or 20 depends on VLAN which I can joinā€¦


On first try. DHCP works smoothly on access ports (lan0, lan2) but on the trunk, it is lottery. Sometimes i get rerply, sometimes not, obviously when i get reply for one VLAN, then iface is ā€œlockedā€ to this subnet and VLAN ( Vlid 10 for example) next DHCP request for VLAN20 failed completely.

By deeper investigation I have found that lan1.10 and lan1.20 has same MAC address. Because these devices works as ā€œstrippersā€ of the VLAN tag, then in my ARP table has two records with same mac, one for lan1.10 and second for lan1.20. Itā€™s a reason why ARP replies canā€™t arrive correctly, because here is not clear to which VLAN it can be deliveredā€¦

Then looks logical to use different MAC adresses for these virtual devices and things started to go wellā€¦ But I havenā€™t uck with this approach tooā€¦

Anyway, with this technology Iā€™m not able to configure reliable VLANs with trunk port (s)


Some people on the forum mentioned VLAN Awarness Bridge(s). Maybe it will works???

Seems that nobody discover that itā€™s COMPLETELY DIFFERENT technology to get VLANs working and canā€™t be mixed with previous approach with lan1.10 virtual devices. But, ok, after discovered this fact, I have got a try to this way too.

This technology is very simmilar to classical managed switches. You create one switch (bridge), assign physical ports to the switch, (set ip dev bridge master) and then define vlan table by bridge vlan commands.

Additionaly, when I try to capture packets in the bridge by tcpdump -i
-v -e
I can see vlan tags, so itā€™s closest to target. But problem with this technology is how to assign ā€œVLAN access portsā€ to the bridge to allow correct DHCP functioanlity for all VLANs etc.

I have tried to create virtuals eth0:0 or eth1:0 by ifconfig command but these links are not visible for bridge vlan command. I have tried virtual dummy devices (sudo ip link set name eth10 dev dummy0) and assign to them IP from requested vlan ranges, put them to the bridgeā€¦ But again, no luck. Unable to get DHCP functional, ARPs problems againā€¦

Then even to use VLAN Awarness Bridge(s) Iā€™m not able to get VLANs with TRUNK port and DHCP functional.


I have spent so much time to ā€œlearnā€ these new LAN technologies by OpenWRT way, and push my Turris to do what I want. No luck. Time what I want to spent on other, original projects, which i share with comunityā€¦
I donā€™t thing that Iā€™m beginner in networking. But Iā€™m from little bit different world (Windows, Mikrotik). I know how things work. Anyway, with TOS3 and switch approach my VLANs has worked smoothly. With TOS5, I waste a week to get things go by right way, with no luck. And by looking through the Turris forum posts, Iā€™m not aloneā€¦

I expect many dehonesting comments onto this post from users on the forum. I expect mentions that my requirements are special, need deep knowledge, itā€™s only for experienced users, Turris is only for special people etcā€¦ I have got this kind of mentions 2 times before from NIC team mebers (PEPE)ā€¦

So I can afford to present one mention to NIC team too: OpenWRT and TOS documentation is mix of outdated pages, complete opposite recipes, and chaotic pages. It can brings headache to anybody who is looking here for help. And seems that Turris documentation goes by this same way.

  • Old Forum
  • New forum
  • Old doc site
  • New doc site
  • Outdated pages
  • One page says One thing, second completely opposite thingā€¦

I know better is to start buid from scratch, itā€™s simplier and more quick approach then maintain current stateā€¦ But itā€™s horrible approach for all users. And remember, these users bought your expensive device with expectations that it will works, will get some support and will helps them to handle personal networking tasksā€¦

Iā€™m using OpenWRT since early years from 2000 decade and I have been advocate for OpenWRTā€¦ In these times, all was clear and updated. User has been able to set things based on recipe on the OpenWRT pages easily. Now itā€™s gone.

So far so god. Is time to say goodbye Turris and OpenWRT. I have ordered APU2 device and going to run pfSense here. It will be cheaper then Turris or Mox (240GB SSD included), allows to me easily set Suricata, ReverseProxy, OpenRadius and many other things on one place, with clear maintained documantation. I briefly goes through pfSense doc site, and itā€™s COMPLETELY DIFFERENT approach then Turris/OpenWRT doc.

And remember that itā€™s same bussiness model as CZNIC Turris/Mox bussines. Modified linux distro free for use and selling certified hardware devices to make profit.


My post is long. My post is farewell. I do not expect that someone will react with functional recipe how to make VLANs with Trunk and DHCP functional. Itā€™s just simple remember to CZNIC, that devices (especially Expensive devices) are here for customers and users, not for developers and people who watch on their customers as stupids second category people. From the forum, i have got feeling that CZNIC team completely forget this. And remember, we PAID lot of money for Turris. So, any kind of maintenance and support can be really good. Itā€™s good habit if you are selling something ā€œexclusiveā€ā€¦

5 Likes

VLAN on TOS 5 is one of the reason, why not update from TOS 3.

@viktor Just notice. TOS5 is Linux. VLAN on Linux is nothing special. No reason to not support VLAN on TOS5. But maybe is an bug inside the TOS5 and itā€™s kernel, or is necessary to configure it specially. Many peple on the forum tried to get VLANs functional. I do not understand why someone from NIC team didnā€™t spend day or two to investigate the problem and prepare some official infoā€¦ CZNIC must be full of networking experts, with deep linux networking knowledge. I think that it canā€™t be big problem for themā€¦

Have you read up on Linuxā€™s DSA documentation Total obsolete VLAN documentation - #14 by anon82920800 - General discussion - Turris forum ?

That aside there are some bugs in Linuxā€™s DSA code that can cause issues and are being fixed mostly in Kernel 5.4 and up and not all of it gets backported to kernel 4.14 (OpenWrt 19.7.x | TOS5.x), albeit the TOS dev already implemented some patching on their own in TOS5.x

So we have to wait for OpenWrt 20.xx and TOS 6. With kernel 5.4 will be also better support for SFP.

What to wait for? DSA works as intended in TOS5 as well as VLAN tag management, either by the kernel or DSA. For the latter the user has to read up the available documentation, no UCI parsing currently available from OpenWrt or TOS, thus no UI either.

2 Likes

Well, thatā€™s a statement. You are not alone, but why didnā€™t you look into forum before writing this post? @anon82920800 described in detail how to use dsaā€¦ (link and also read the following posts).
But I totally second that there is no official documentation from team for it, which I deem the wrong way. And it is also the wrong way to always insist in referring to upstream documentation - this project was clearly promoted to deliver necessary documentation how-tos. And VLAN configuration definitely belongs to itā€¦

3 Likes

Turris Team, please add this topic to the docs. Thank you!

And thank you for your comment, @anon82920800. I just updated to TOS5. Than I was looking at the documentation page to see how to configure a VLAN. I found: nothing. But I found this topic here. I think VLAN needs to be mentioned in the documentation, too. A good reason to ā€œbump the topic to the top of its listā€.

1 Like

This is already documented in our documentation. See it here:

In past, there were details in our forum and Gitlab how you can add VLANs.

2 Likes

Well, this information is about how to create software VLANs where traffic needs to pass CPU and therefore causes load. So as long as this does not invoke bridge or if-commands (which is still not available upstream if Iā€™m not mistaken) it is the worse solution compared with CLI-options.
Why do you not list the exact bridge-commands to invoke hardware-based VLANs in documentation or at least link upstream manuals?
But bottom line: yes, this does work.

1 Like

This issue finally broke my trust in TO.
I upgraded to most recent TOS5 last week, and tried since to get my VLAN configuration running again.
Actually nothing fancy - just 3 different VLANs on the LAN4 interface bridged to different other LAN and WLAN ports. Worked flawlessly with TOS3, fails miserably on TOS5.

I followed this and several other threads here and learned several things:

  1. With the switch to DSA a lot changed - and broke.
  2. There are basically two ways to make VLANs work again: Use the supposedly still valid TOS3 approach, where tagging is handled by the kernel (run through CPU) or use the new DSA stuff, handling tagging withing the switch.
  3. The latter requires some in-depth DSA knowledge and CLI commands. This approach must be considered highly unssupported, because it is not documented anywhere (except whats contained here in some threads), not supported by LuCI and it remains unclear how to persist all that. Not to mention how this will behave on future upgrades. No option for me.
  4. The old TOS3 approach with just using LANx.y interfaces would be a feasible approach - for me. I would happily accept the extra CPU cycles for the tagging overhead, for a simple and working solution, even configurable via LuCI. However, this turned out to be a deadend also. Although this way is actually documented, it really doesnā€™t work due to several ARP issues (described here among other places).

After all that hassle, I desperately reverted back to TOS3. Iā€™m not very confident, that CZ.NIC will eventually resolve this issue. It is already known for so long, but nothing moves.
Actually a pitty. Iā€™d so much hope for TO when backing this project in the very first Indiegogo campaign.
Currently, I canā€™t recommend this device to anybody.

And sorry for this non-productive post, but I had a strong desire to give some feedback from a userā€™s view.

Just for your information upstream OpenWrt is in the process to switch from swconfig to DSA, which also turned out taking much longer than intended, part of the reason why there was no release in 2020. But the mainline kernel folks have decided not to accept swconfig and that the already in mainline DSA is the official way; so while these delays are unfortunate and the period of limbo right now unpleasent I think both OpenWrt and turris are doing the right thing here. I believe we are getting close to an OpenWrt release now, which probably contains a way to configure DSA switches from the GUI, which should help simple setups.

1 Like

Hi. I gave up the TOS and Turis device due VLAN issues months ago. Bought APU2 and have moved to PFSense. Never looked back again. Itā€™s like moved from old home flicked car to Ferrariā€¦

Excellent hardware is just piece of silicon without software. And itā€™s exactly the case of the Turris project and CZ-NIC. I do not see any strong support and TOS development hereā€¦

Seems that all support for the Turris and MOX from the CZ-NIC is just re-compile community driven OpenWRT. Nothing what you expect from expensive router devices as Turris or MOX areā€¦

If you can get things to work, move to some verified and tested solution as me.

[OpenWrt Wiki] Roadmap and Release Goals for 21.02 DSA luci support is WIP for the next release.

Letā€™s work with this.
Can you tell why it was a dead end for you? As soon as I changed from lan4 to WAN everything worked just like expected. I have 3 VLAN-IDs defined, too.
Btw - there is NO performance drop concerning using CPU instead of invoking DSA if you have just one ethernet cable connected (either if you transport WAN via VLAN or you use your TO as AP only, in both cases connected to a managed switch)

You are simply wrong. There is a segment where Turris guys are able to develop (e.g. MOX SDIO wifi didnā€™t work for a long time but they digged into it and now it works like a charm; right now thereā€™s development for 2,5 Gbps SFP and 802.11ax cards - when they do make this happen they are magicians in my eyes!) and there are others which can only be done upstream (DSA itself (Linux Kernel) and UCI implementation (openwrt team)).
The fact they made the SDIO Wi-Fi work gave me back my believe in this project but I clearly understand the frustration as I was frustrated several times myself and very close to selling the devices. The key is understanding what you want in terms of SOHO-networking and how the Turris devices might fit into it. I ended up using them for AP only - very strong AP that might in the near future even run 802.11ax with speeds near to the standardā€™s maximum. How cool is that for a device that was produced in 2016 and completely FOSS! Your APUs wonā€™t be able to do that :upside_down_face:
But for router and switching I went to a X86-server with 10 Gbps connected to 10 Gbps-switches (and soon even 40 Gbps) that hosts my OpenWrt VM (amongst a dokuwiki, piHole, nextcloud, mail- and storage server) which doesnā€™t leave anything open to desire. And I use a second TO as LTE-router and AP for my allotment where it is used as garden-automation centre and works like a charmā€¦

WAN is used for the uplink (no VLAN there), so I canā€™t move my internal VLANs to that port.
I need to bridge LAN4.10-LAN3-LAN2-LAN1-LAN0 as well as LAN4.11 and LAN4.12 with some WLAN ports respectively.
As stated before, the bridge has some ARP problems. A connection between a client ā€œAā€ connected in a network attached to LAN4.10 to a host ā€œBā€ on LAN2 is not established, because ARP is not resolved.
ā€œAā€ is broadcasting ARP requests, but ā€œBā€ does not answer, despite they are on the same segment.
This problem was already discussed in great detail in the post I mentioned.
Thatā€™s my deadend.

Yes you can move it - just interchange eth2 and lan4 so you get something like

config interface 'lan'
	option ifname 'lan0 lan1 lan2 lan3 eth2.10'
	option type 'bridge'
	option proto 'static'
	option ipaddr '192.168.1.1'
	option netmask '255.255.255.0'
	option delegate '0'
	option _turris_mode 'managed'

config interface '<other interface 1>'
	option ifname 'eth2.11'
	option type 'bridge'
	option proto 'static'
	option ipaddr '<other.interface1.ip.address>'
	option netmask '255.255.255.0'
	option delegate '0'

config interface '<other interface 2>'
	option ifname 'eth2.12'
	option type 'bridge'
	option proto 'static'
	option ipaddr '<other.interface2.ip.address>'
	option netmask '255.255.255.0'
	option delegate '0'

config interface 'wan'
	option ifname 'lan4'
	option proto 'dhcp'

Adapt it to your needs and afterwards simply connect your ethernet cable for WAN to lan4-port and your cable for lan/<other interface 1>/<other interface 2> to WAN-port and do a /etc/init.d/network restart via SSH and you are done.
I did the same (actually I just changed interface names, VLAN-IDs and IP-adresses in the snippet above, yet I do not use WAN (as my TO is a dump but :muscle: AP)). The ARP-issues should be gone as they are for me :slight_smile:
btw - this is also an example for a situation where direct invocation of DSA doesnā€™t give any benefit: The traffic from/to WAN is completely processed (as you do not have a VLAN-ID set here) by CPU and therefore there is no VLAN-tagged traffic that could be forwarded from one port to another without passing CPU (which would be handled like shown by @anon82920800 ->here)

2 Likes