Little rant here - this year I already spent over 3 hrs fixing fallout from broken updates. 3.8 completely broke my router - infinite reboot loop. Your updates are fragile because they rely on poorly written updater software. Sorry, but if you tried making updates immutable ie. one package download with everything, it would probably fix many of your current problems.
After subsequent factory resets I reached a state where it was necessary to disable updates in the wizard and keep the original ip address for the DHCP pool. Did you guys actually test stock router you shipped being updated to 3.8 with âupdates onâ and âDHCP ip range changeâ in the wizard? It doesnât look like you did, because it fails on random stuff!
I thought I had it right, but then any of my WiFi clients didnât lease a IPv4 DHCP address. What kind of weird error is that? How can this happen?
I love your project, but I just canât see how it would work for average Joe who just turns things off and on again.
My WiFi clients keep getting and losing ipv4 addresses from DHCP in matter of couple seconds and this results in spotty connectivity. Anyone knows what the problem is? Iâm looking in /var/log/messages and I donât see anything out of the ordinary.
Ok, I rebooted the router and the same DHCP problem also appeared on LAN.
Looking at the client DHCP log:
Sep 16 08:12:01 computer networkd[204]: nw_nat64_post_new_ifstate successfully changed NAT64 ifstate from 0x4 to 0x8000000000000000 <-- here it starts working
Sep 16 08:12:10 computer networkd[204]: nw_nat64_post_new_ifstate successfully changed NAT64 ifstate from 0x8000000000000000 to 0x4 <-- here it stops working
Iâve noticed. I still believe in this device and I donât mind minor stuff breaking every now and then, it comes with the territory. But please put all effort in getting upstream support for the hardware instead of trying to improve things yourself. I appreciate all effort put into this and I know circumstances change all the time. But proper prioritization should be LEDE support first, so any users fed up with Omnia issues can just move to LEDE. Then once that is an option, you can try and re-invent the wheel all you want in Omnia, I would definitely try every now and then and see if you can surpass LEDE.
Itâs completely unacceptable solution to just delay updates. Youâd be just offsetting problems to a later date. I think the real solution is to do a slow rolling release and implement voluntary release feedback loop via on-device metrics. Turris team would buy time this way to fix discovered problems.
The problem is really that providing feedback by typing messages here on the forum is not effective. Not everyone participates nor has perfect picture of whatâs going on in the device. Voluntary RC/nightly channels are also not a solution when youâre doing completely automated updates. Itâs unrealistic to believe any testing device farm setup could catch errors that only appear in production. Option to postpone updates just because theyâre shipped broken is just creating another, much deeper problem and basically declaring that the main feature of the router - security - is something that can/should be disabled. Give users too many settings/options and this router will end up with the same broken security just like competitor products.
Like I said, slow rolling updates and metrics collection (similar to what is already being collected) would buy time for the team to fix discovered problems without users having to postpone critical updates or disable them altogether.
Yes rolling releases would be awesome. And we want that. But we canât do that with current OpenWRT build system. Let me just state the state of current build process. We have to build all packages at once. If single package we want fail we have to repeat whole process. Missing dependencies are sometimes hidden and they result in to race conditions during build and there is a lot of those. Sometimes whole parts of packages trees are silently dropped simply because of typo in one of the packages. The result is just weird bugs, missing packages and overall build instability (few builds fail for no obvious reason while next build from same sources passes). And we canât build single package update without worrying about rest of the system. The result is that with current build system we canât do rolling releases (although we are trying to do so with nightly and test branches). OpenWRT build system is just designed for âbuild it onceâ approach.
But writing new build system is very time demanding. We have colleague that worked with Open Build Service and has huge knowledge in this field. But we have currently more pressing issues such as getting in sync with lede tree. So new build system wonât be any time soon.
Thank you, I really appreciate your careful explanation. Despite my comments, I remain optimistic about the mission of this device and the software youâre making.
Once again you f***** over people running dnsmasq. This time not by disabling it (resetting listen port to 0, as you used to) but by force feeding kresd on your users.
In the system/startup page I have dnsmasq: Enabled, kresd: Disabled, yet rebooting leaves me with a kresd running, listening on port 53⌠Effectively disabling my differentiated upstream nameservers rendering my internal work network and my local hosts (dnsmasq specific hosts file) unresolvable and dhcp disabled. Annoyed family as a result, calling for dump that piece of junk, and buy something that works.
Iâm seriously thinking about writing a script, that will run 5 min after boot, checking that dnsmasq is listening to port 53, and if not trying remedy the situation (killall -9 kresd && ensure âoption port â53ââ is listed && /etc/init.d/dnsmasq start)
As a now long time openSUSE contributor, Iâm happy at least that it was considered (even if there far more pressing issues to tackle first, as you mentioned): it sounds the exact solution for this problem.