Update broke my router

kixorz · September 16, 2017, 3:39am

Little rant here - this year I already spent over 3 hrs fixing fallout from broken updates. 3.8 completely broke my router - infinite reboot loop. Your updates are fragile because they rely on poorly written updater software. Sorry, but if you tried making updates immutable ie. one package download with everything, it would probably fix many of your current problems.

After subsequent factory resets I reached a state where it was necessary to disable updates in the wizard and keep the original ip address for the DHCP pool. Did you guys actually test stock router you shipped being updated to 3.8 with “updates on” and “DHCP ip range change” in the wizard? It doesn’t look like you did, because it fails on random stuff!

I thought I had it right, but then any of my WiFi clients didn’t lease a IPv4 DHCP address. What kind of weird error is that? How can this happen?

I love your project, but I just can’t see how it would work for average Joe who just turns things off and on again.

kixorz · September 16, 2017, 12:42pm

My WiFi clients keep getting and losing ipv4 addresses from DHCP in matter of couple seconds and this results in spotty connectivity. Anyone knows what the problem is? I’m looking in /var/log/messages and I don’t see anything out of the ordinary.

S474N · September 16, 2017, 12:51pm

Same problem?

kixorz · September 16, 2017, 1:15pm

Ok, I rebooted the router and the same DHCP problem also appeared on LAN.

Looking at the client DHCP log:

Sep 16 08:12:01 computer networkd[204]: nw_nat64_post_new_ifstate successfully changed NAT64 ifstate from 0x4 to 0x8000000000000000 <-- here it starts working

Sep 16 08:12:10 computer networkd[204]: nw_nat64_post_new_ifstate successfully changed NAT64 ifstate from 0x8000000000000000 to 0x4 <-- here it stops working

rfc2822 · September 16, 2017, 2:10pm

Same problem here. Then I noticed that I still had the LAN IPv6 address on my computer:

enp30s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
...
        inet6 fdf9:5b71:1388::836  prefixlen 128  scopeid 0x0<global>

So you can ssh to root@fdf9:5b71:1388::1.
Then set “maindhcp” to “1” instead of “0” in /etc/config/dhcp (“odhcp” section).
“/etc/init.d/odchpd restart”
Connect again with DHCP, you should now get an IPv4 address

However, the update seems to be really broken. No IPv4 connection although factory reset, the “register DHCP devices in DNS” doesn’t work, …

kixorz · September 16, 2017, 4:48pm

Thanks for this, it looks like it fixed the problem. But from just changing the config, I really don’t understand what the real problem is.

anon71276338 · September 17, 2017, 12:20pm

is it possible to disable auto update?

They seem to be incapable of updating the software without breaking the config every single time

It’s very disappointing …

vcunat · September 17, 2017, 1:06pm

It is. Perhaps more interesting option (new since 3.8) is to delay performing them in Foris/Updater.

Updates will be installed with an adjustable delay. You can also approve them manually.

patrickm · September 17, 2017, 2:47pm

Update broke my DNS which is unacceptable. I’m also pretty fed up with this crap, how can I install vanilla LEDE/OpenWRT?

vcunat · September 17, 2017, 3:34pm

LEDE/OpenWRT doesn’t support the hardware yet; see Integrating Hardware support into LEDE/OpenWrt by the community - SW tweaks - Turris forum

patrickm · September 17, 2017, 3:41pm

I’ve noticed. I still believe in this device and I don’t mind minor stuff breaking every now and then, it comes with the territory. But please put all effort in getting upstream support for the hardware instead of trying to improve things yourself. I appreciate all effort put into this and I know circumstances change all the time. But proper prioritization should be LEDE support first, so any users fed up with Omnia issues can just move to LEDE. Then once that is an option, you can try and re-invent the wheel all you want in Omnia, I would definitely try every now and then and see if you can surpass LEDE.

kixorz · September 17, 2017, 3:54pm

It’s completely unacceptable solution to just delay updates. You’d be just offsetting problems to a later date. I think the real solution is to do a slow rolling release and implement voluntary release feedback loop via on-device metrics. Turris team would buy time this way to fix discovered problems.

vcunat · September 17, 2017, 4:11pm

AFAIK It’s meant for those that don’t want any changes to happen when they’re away and someone else needs the internet.

Anyone can already use the RC or nightly channels. The problem is that very few want them (and often it’s those who want the new features).

kixorz · September 17, 2017, 4:28pm

The problem is really that providing feedback by typing messages here on the forum is not effective. Not everyone participates nor has perfect picture of what’s going on in the device. Voluntary RC/nightly channels are also not a solution when you’re doing completely automated updates. It’s unrealistic to believe any testing device farm setup could catch errors that only appear in production. Option to postpone updates just because they’re shipped broken is just creating another, much deeper problem and basically declaring that the main feature of the router - security - is something that can/should be disabled. Give users too many settings/options and this router will end up with the same broken security just like competitor products.

Like I said, slow rolling updates and metrics collection (similar to what is already being collected) would buy time for the team to fix discovered problems without users having to postpone critical updates or disable them altogether.

cynerd · September 17, 2017, 6:42pm

Yes rolling releases would be awesome. And we want that. But we can’t do that with current OpenWRT build system. Let me just state the state of current build process. We have to build all packages at once. If single package we want fail we have to repeat whole process. Missing dependencies are sometimes hidden and they result in to race conditions during build and there is a lot of those. Sometimes whole parts of packages trees are silently dropped simply because of typo in one of the packages. The result is just weird bugs, missing packages and overall build instability (few builds fail for no obvious reason while next build from same sources passes). And we can’t build single package update without worrying about rest of the system. The result is that with current build system we can’t do rolling releases (although we are trying to do so with nightly and test branches). OpenWRT build system is just designed for “build it once” approach.

But writing new build system is very time demanding. We have colleague that worked with Open Build Service and has huge knowledge in this field. But we have currently more pressing issues such as getting in sync with lede tree. So new build system won’t be any time soon.

kixorz · September 17, 2017, 8:50pm

Thank you, I really appreciate your careful explanation. Despite my comments, I remain optimistic about the mission of this device and the software you’re making.

bogeskov · September 18, 2017, 5:00am

Once again you f***** over people running dnsmasq. This time not by disabling it (resetting listen port to 0, as you used to) but by force feeding kresd on your users.

In the system/startup page I have dnsmasq: Enabled, kresd: Disabled, yet rebooting leaves me with a kresd running, listening on port 53… Effectively disabling my differentiated upstream nameservers rendering my internal work network and my local hosts (dnsmasq specific hosts file) unresolvable and dhcp disabled. Annoyed family as a result, calling for dump that piece of junk, and buy something that works.

I’m seriously thinking about writing a script, that will run 5 min after boot, checking that dnsmasq is listening to port 53, and if not trying remedy the situation (killall -9 kresd && ensure “option port ‘53’” is listed && /etc/init.d/dnsmasq start)

einar · September 19, 2017, 8:13am

As a now long time openSUSE contributor, I’m happy at least that it was considered (even if there far more pressing issues to tackle first, as you mentioned): it sounds the exact solution for this problem.

Vojtech_Pihrt · September 26, 2017, 6:42am

Thanks RFC2822, this worked for me!

Honestly, this is realy bug when upgrade to 3.8.1 changes the setting this way (all my devices was unable to get IP - exept one ntbk…)