Update broke my router


#1

Little rant here - this year I already spent over 3 hrs fixing fallout from broken updates. 3.8 completely broke my router - infinite reboot loop. Your updates are fragile because they rely on poorly written updater software. Sorry, but if you tried making updates immutable ie. one package download with everything, it would probably fix many of your current problems.

After subsequent factory resets I reached a state where it was necessary to disable updates in the wizard and keep the original ip address for the DHCP pool. Did you guys actually test stock router you shipped being updated to 3.8 with “updates on” and “DHCP ip range change” in the wizard? It doesn’t look like you did, because it fails on random stuff!

I thought I had it right, but then any of my WiFi clients didn’t lease a IPv4 DHCP address. What kind of weird error is that? How can this happen?

I love your project, but I just can’t see how it would work for average Joe who just turns things off and on again.


Turris OS 3.11 is out!
Package version mismatch after 3.8 was published
#2

My WiFi clients keep getting and losing ipv4 addresses from DHCP in matter of couple seconds and this results in spotty connectivity. Anyone knows what the problem is? I’m looking in /var/log/messages and I don’t see anything out of the ordinary.


Turris OS 3.9.6 is out!
#3

Same problem?


#4

Ok, I rebooted the router and the same DHCP problem also appeared on LAN.

Looking at the client DHCP log:

Sep 16 08:12:01 computer networkd[204]: nw_nat64_post_new_ifstate successfully changed NAT64 ifstate from 0x4 to 0x8000000000000000 <-- here it starts working

Sep 16 08:12:10 computer networkd[204]: nw_nat64_post_new_ifstate successfully changed NAT64 ifstate from 0x8000000000000000 to 0x4 <-- here it stops working


#5

Same problem here. Then I noticed that I still had the LAN IPv6 address on my computer:

enp30s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
...
        inet6 fdf9:5b71:1388::836  prefixlen 128  scopeid 0x0<global>
  1. So you can ssh to root@fdf9:5b71:1388::1.
  2. Then set “maindhcp” to “1” instead of “0” in /etc/config/dhcp (“odhcp” section).
  3. “/etc/init.d/odchpd restart”
  4. Connect again with DHCP, you should now get an IPv4 address

However, the update seems to be really broken. No IPv4 connection although factory reset, the “register DHCP devices in DNS” doesn’t work, … :frowning:


Turris 1.x - DHCP - nefunkcni pridelovani IP po upgradu na 3.8
DHCP Not Working After Update from Factory Settings (and Update 3.8 Issues)
[Urgent] after update to 3.8.1 no dhcp service for ipv4 anymore
Wifi not working - stuck at "Obtaining IP Adress"
Turris 1.x s BTRFS - chyba v LUCI
DHCP Not Working After Update from Factory Settings (and Update 3.8 Issues)
Absolutní nefunkčnost DNS
#6

Thanks for this, it looks like it fixed the problem. But from just changing the config, I really don’t understand what the real problem is.


#7

is it possible to disable auto update?

They seem to be incapable of updating the software without breaking the config every single time

It’s very disappointing …


#8

It is. Perhaps more interesting option (new since 3.8) is to delay performing them in Foris/Updater.

Updates will be installed with an adjustable delay. You can also approve them manually.


#9

Update broke my DNS which is unacceptable. I’m also pretty fed up with this crap, how can I install vanilla LEDE/OpenWRT?


#10

LEDE/OpenWRT doesn’t support the hardware yet; see Integrating Hardware support into LEDE/OpenWrt by the community


#11

I’ve noticed. I still believe in this device and I don’t mind minor stuff breaking every now and then, it comes with the territory. But please put all effort in getting upstream support for the hardware instead of trying to improve things yourself. I appreciate all effort put into this and I know circumstances change all the time. But proper prioritization should be LEDE support first, so any users fed up with Omnia issues can just move to LEDE. Then once that is an option, you can try and re-invent the wheel all you want in Omnia, I would definitely try every now and then and see if you can surpass LEDE.


#12

It’s completely unacceptable solution to just delay updates. You’d be just offsetting problems to a later date. I think the real solution is to do a slow rolling release and implement voluntary release feedback loop via on-device metrics. Turris team would buy time this way to fix discovered problems.


#13

AFAIK It’s meant for those that don’t want any changes to happen when they’re away and someone else needs the internet.

Anyone can already use the RC or nightly channels. The problem is that very few want them (and often it’s those who want the new features).


#14

The problem is really that providing feedback by typing messages here on the forum is not effective. Not everyone participates nor has perfect picture of what’s going on in the device. Voluntary RC/nightly channels are also not a solution when you’re doing completely automated updates. It’s unrealistic to believe any testing device farm setup could catch errors that only appear in production. Option to postpone updates just because they’re shipped broken is just creating another, much deeper problem and basically declaring that the main feature of the router - security - is something that can/should be disabled. Give users too many settings/options and this router will end up with the same broken security just like competitor products.

Like I said, slow rolling updates and metrics collection (similar to what is already being collected) would buy time for the team to fix discovered problems without users having to postpone critical updates or disable them altogether.


#15

Yes rolling releases would be awesome. And we want that. But we can’t do that with current OpenWRT build system. Let me just state the state of current build process. We have to build all packages at once. If single package we want fail we have to repeat whole process. Missing dependencies are sometimes hidden and they result in to race conditions during build and there is a lot of those. Sometimes whole parts of packages trees are silently dropped simply because of typo in one of the packages. The result is just weird bugs, missing packages and overall build instability (few builds fail for no obvious reason while next build from same sources passes). And we can’t build single package update without worrying about rest of the system. The result is that with current build system we can’t do rolling releases (although we are trying to do so with nightly and test branches). OpenWRT build system is just designed for “build it once” approach.

But writing new build system is very time demanding. We have colleague that worked with Open Build Service and has huge knowledge in this field. But we have currently more pressing issues such as getting in sync with lede tree. So new build system won’t be any time soon.


#16

Thank you, I really appreciate your careful explanation. Despite my comments, I remain optimistic about the mission of this device and the software you’re making.


#17

Once again you f***** over people running dnsmasq. This time not by disabling it (resetting listen port to 0, as you used to) but by force feeding kresd on your users.

In the system/startup page I have dnsmasq: Enabled, kresd: Disabled, yet rebooting leaves me with a kresd running, listening on port 53… Effectively disabling my differentiated upstream nameservers rendering my internal work network and my local hosts (dnsmasq specific hosts file) unresolvable and dhcp disabled. Annoyed family as a result, calling for dump that piece of junk, and buy something that works.

I’m seriously thinking about writing a script, that will run 5 min after boot, checking that dnsmasq is listening to port 53, and if not trying remedy the situation (killall -9 kresd && ensure “option port ‘53’” is listed && /etc/init.d/dnsmasq start)


#18

As a now long time openSUSE contributor, I’m happy at least that it was considered (even if there far more pressing issues to tackle first, as you mentioned): it sounds the exact solution for this problem.


#19

Thanks RFC2822, this worked for me!

Honestly, this is realy bug when upgrade to 3.8.1 changes the setting this way (all my devices was unable to get IP - exept one ntbk…)


closed #20