Optional migration from Turris OS 3.x for advanced users

horada · June 22, 2021, 5:11am

Additional update… Yesterday evening I started the migration, after some time, version in /etc/turris-version changed to 5.2.2 and shortly after that the router was rebooted and I received one email notification stating that btrfs support was removed (as mentioned in the doc). But I still didn’t received the final email, that router was completely migrated (it is more than 10 hours now).
Internet connection seems to work fine also I’m able to ssh to the router, but web UI is not available.
So my question is, is there any other way, to check if the migration is still in progress, or if it was broken somehow? Or should I just wait more time?

In /var/log/messages I see following messages related to updater:

Jun 21 22:00:01 turris crond[26856]: (root) CMD (/usr/bin/updater-supervisor -d --autorun --rand-sleep --no-network-fail)
Jun 21 22:00:04 turris updater-supervisor: Suspending updater start for 2156 seconds
Jun 21 22:36:01 turris updater-supervisor: Running pkgupdate
Jun 21 22:36:06 turris updater[29895]: repository.lua.lua:49 (Globals): Target Turris OS: 5.2.2
Jun 21 22:36:08 turris updater[29895]: requests.lua:126 (extra_annul_ignore): Install extra option "version" is obsolete and should not be used. Specify version directly to package name instead.
        ^^^ this line is repeated multiple times ^^^^

Jun 21 22:36:27 turris updater[29895]: planner.lua:356 (pkg_plan): Requested package luci-i18n-wshaper-en that is missing, ignoring as requested.
Jun 21 22:36:27 turris updater[29895]: planner.lua:356 (pkg_plan): Requested package luci-i18n-wshaper-cs that is missing, ignoring as requested.
Jun 21 22:36:27 turris updater[29895]: planner.lua:356 (pkg_plan): Requested package luci-i18n-sqm-cs that is missing, ignoring as requested.
Jun 21 22:36:27 turris updater[29895]: planner.lua:356 (pkg_plan): Requested package luci-i18n-sqm-en that is missing, ignoring as requested.
Jun 21 22:36:27 turris updater[29895]: planner.lua:356 (pkg_plan): Requested package luci-i18n-rainbow-en that is missing, ignoring as requested.
Jun 21 22:36:27 turris updater[29895]: planner.lua:356 (pkg_plan): Requested package luci-i18n-rainbow-cs that is missing, ignoring as requested.
Jun 21 22:36:28 turris updater-supervisor: pkgupdate reported no errors

cynerd · June 22, 2021, 7:22am

Have you used opkg-cl to install tos3to4? The notification is sent on removal of that package that should happen sometime after the final reboot. If you used opkg instead you have to remove install request from /etc/updater/conf.d/opkg-auto.lua for tos3to4 package.

The repeated line is for sure from something you have in /etc/updater/conf.d scripts. Some scripts are going to contain something like Install("foo", { version = "<1.0" }) where syntax has changed. This change happened some time ago so I do not remember the details but I think that it wasn’t used as part of opkg integration and thus only way might be some hack included in those files. It is most likely something we were tackled together in the past.

horada · June 22, 2021, 7:48am

Yes, I used the opkg-cl - copy pasted the whole command: opkg-cl update && opkg-cl install tos3to4 && updater-supervisor -d. Also there is no tos3to4 mentioned in /etc/updater/conf.d/opkg-auto.lua.

So is it safe to try to restart the router?

About the version issue, it seems to be caused by following statements in /etc/updater/conf.d/opkg-auto.lua:

Install("foo", { ignore = { "missing" } })

cynerd · June 22, 2021, 11:35am

It should be safe to restart the router. Can you please check that tos3to4 package is not present in the system? It might be that for some reason the notification is just not sent before reboot and thus wiped. I am going to look into it, although it is kind of minor if tos3to4 is gone and everything works.

Hmm… I see. Updater seems to be reporting the wrong option there. I probably see why.

horada · June 22, 2021, 12:01pm

Package tos3to4 wasn’t present, reboot passed correctly.

There were one another problem - lighttpd wasn’t running - the reason turns out to be duplicate definition of SSL configuration (I already had SSL configured for letsencrypt certificate in ssl-enable-letsencrypt.conf, which conflicts with ssl-enable.conf installed from package lighttpd-mod-openssl. So it was tightly related to my configuration changes.

Maybe just question about that, is it safe to remove the lighttpd-mod-openssl package, when I have ssl configured in different file with LetsEncrypt cert? (Now I just renamed the ssl-enable.conf config and lighttpd started correctly.)

cynerd · June 22, 2021, 12:06pm

You do not want to remove lighttpd-mod-openssl as you are using it in your SSL configuration for sure. What you want to remove is lighttpd-https-cert. You have to convince updater that you really do not want that and that can be done by updater’ config containing:

Uninstall('lighttpd-https-cert')
Install("lighttpd-mod-openssl")

horada · June 22, 2021, 12:14pm

Thanks for the correction, I copy&pasted the wrong package.

And thanks for all the info and help!

Pepe · June 24, 2021, 9:27pm

Since today, we have enabled opt-in migration for Turris 1.x routers by using the package list in Foris. I updated the first post in the thread.

Patryk · September 22, 2021, 5:14pm

I tried Migration to Turris OS 5.x in Foris interface two weeks ago, but unfortunately the new Turris OS 5.2.6 didn’t boot correctly. I was unable to access Omnia through the network, but fortunately I have UART cable and the console was throwing lots of messages like these:

[  200.655798] mvneta f1030000.ethernet eth0: bad rx status 0cc10000 (crc error), size=66
[  200.983105] mvneta f1030000.ethernet eth0: bad rx status 0cc10000 (crc error), size=67
[  201.148152] mvneta f1030000.ethernet eth0: bad rx status 0cc10000 (crc error), size=66
[  201.217080] mvneta f1030000.ethernet eth0: bad rx status 0cc10000 (crc error), size=67
[  201.293618] mvneta f1030000.ethernet eth0: bad rx status 4fa10000 (crc error), size=441
[  201.713065] mvneta f1030000.ethernet eth0: bad rx status 4fa10000 (crc error), size=69
[  201.721029] mvneta f1030000.ethernet eth0: bad rx status 4fa10000 (crc error), size=69
[  202.147187] mvneta f1030000.ethernet eth0: bad rx status 0cc10000 (crc error), size=66
[  202.241244] mvneta f1030000.ethernet eth0: bad rx status 0cc10000 (crc error), size=66
[  202.305048] mvneta f1030000.ethernet eth0: bad rx status 4fa10000 (crc error), size=430
[  202.313087] mvneta f1030000.ethernet eth0: bad rx status 4ea10000 (crc error), size=450

and

[    0.000000] Division by zero in kernel.
[    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.244 #0
[    0.000000] Hardware name: Marvell Armada 380/385 (Device Tree)
[    0.000000] [<c010feec>] (unwind_backtrace) from [<c010b36c>] (show_stack+0x10/0x14)
[    0.000000] [<c010b36c>] (show_stack) from [<c07d2ba0>] (dump_stack+0x94/0xa8)
[    0.000000] [<c07d2ba0>] (dump_stack) from [<c07d122c>] (Ldiv0+0x8/0x10)
[    0.000000] [<c07d122c>] (Ldiv0) from [<c051b0a4>] (clk_cpu_recalc_rate+0x28/0x2c)
[    0.000000] [<c051b0a4>] (clk_cpu_recalc_rate) from [<c051749c>] (clk_register+0x3f4/0x67c)
[    0.000000] [<c051749c>] (clk_register) from [<c0a1a7b0>] (of_cpu_clk_setup+0x16c/0x310)
[    0.000000] [<c0a1a7b0>] (of_cpu_clk_setup) from [<c0a1a010>] (of_clk_init+0x16c/0x214)
[    0.000000] [<c0a1a010>] (of_clk_init) from [<c0a039c8>] (time_init+0x24/0x2c)
[    0.000000] [<c0a039c8>] (time_init) from [<c0a00c4c>] (start_kernel+0x35c/0x4cc)
[    0.000000] [<c0a00c4c>] (start_kernel) from [<00000000>] (  (null))
[    0.000000] Division by zero in kernel.
[    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.244 #0
[    0.000000] Hardware name: Marvell Armada 380/385 (Device Tree)
[    0.000000] [<c010feec>] (unwind_backtrace) from [<c010b36c>] (show_stack+0x10/0x14)
[    0.000000] [<c010b36c>] (show_stack) from [<c07d2ba0>] (dump_stack+0x94/0xa8)
[    0.000000] [<c07d2ba0>] (dump_stack) from [<c07d122c>] (Ldiv0+0x8/0x10)
[    0.000000] [<c07d122c>] (Ldiv0) from [<c051b0a4>] (clk_cpu_recalc_rate+0x28/0x2c)
[    0.000000] [<c051b0a4>] (clk_cpu_recalc_rate) from [<c051749c>] (clk_register+0x3f4/0x67c)
[    0.000000] [<c051749c>] (clk_register) from [<c0a1a7b0>] (of_cpu_clk_setup+0x16c/0x310)
[    0.000000] [<c0a1a7b0>] (of_cpu_clk_setup) from [<c0a1a010>] (of_clk_init+0x16c/0x214)
[    0.000000] [<c0a1a010>] (of_clk_init) from [<c0a039c8>] (time_init+0x24/0x2c)
[    0.000000] [<c0a039c8>] (time_init) from [<c0a00c4c>] (start_kernel+0x35c/0x4cc)
[    0.000000] [<c0a00c4c>] (start_kernel) from [<00000000>] (  (null))

I only managed to get my Omnia back to working state by rolling back to the snapshot from before I enabled migration in Foris.

Orzech · September 22, 2021, 8:24pm

I’ve upgraded my Omnia from 3.x to 5.x today.

After more than an hour of waiting, I logged into Reforis and saw my previous wi-fi settings etc. I didn’t remember every setting out there, but it looked familiar with what I did set myself using 3.x. I have a shitty e-mail provider and thought that perhaps e-mail notification was sent but just not present in my inbox yet.

So I went through all (I think) configuration screens and pressed “Save” buttion on each of them. After doing so on package selection screen, I got notification that upgrade process has been finished. It looks like this notification was triggered by me pressing “Save” button there.

I restarted router after that and everything seems fine. But is it really? How to verify that? How to check, for example, that my firewall rules are ok and nothing was opened by accident etc.?

PS. Thank you for your hard work and support!

cynerd · September 23, 2021, 8:38am

@Patryk this is a very weird error as kernel fault of course would be reproduced by other users as well. I looked into the code and I can’t see any clear way the division would happen in that specific function. Thus it leads me to the conclusion that the cause might be a corrupted kernel image. Are you sure that you finished the migration correctly? The router gets rebooted automatically as part of the migration process. Have that happen or was there a forced reboot? The second option is that MMC might just be worn out and out of the sectors and thus new writes could result in corrupted files. The kernel would report a load of BTRFS errors if that is the case (during the update most likely).
In general, I would appreciate it if you could send me the exported snapshot with your working 3.x version so I can attempt migration directly (of course only if you are willing to do so). You can get exported snapshots using schnappps export X where X is the number of 3.x snapshot that worked for you.

@Orzech I am happy to hear that migration finished in your case. The process is that we declare migration finished only after the first successful regular check for updates. That happens on a four-hour basis (that is up to four hours later from migration) or can be forced by clicking on Check and install updates in reForis (as well as few others that trigger an update to install or remove additional software).
On the topic of verification. You do it the same way as you would do with a new router. Feel free to browse reForis and LuCI and look around with SSH. For example, the Firewall is compatible in its settings between 3.x and 5.x versions thus there should be no change.

Orzech · September 23, 2021, 10:09am

@cynerd

Yeah, I think I had pressed Check and install updates some seconds before Save on package selection screen, as the buttons are on adjoining screens, so perhaps it was just a coincidence that the notification appeared after me pressing Save. It’s not a problem that the Save was pressed before update check has finished, is it?

As for the firewall, could point perhaps me to some default configuration so I could compare mine with it?

Skippi · September 23, 2021, 10:33am

Firewall settings are in file /etc/config/firewall and default setting should be in /etc/config/firewall-opkg.

You can mount snapshot created before migration to version 5 from SSH.
List of all snapshots:
schnapps list.
And try to find line before first with description “Automatic post-update snapshot (TurrisOS 5.2.7)”.
In my case it was #431:

    # | Type      | Size        | Date                        | Description
------+-----------+-------------+-----------------------------+------------------------------------
  428 | time      |    34.31MiB | 2021-09-12 01:05:02 +0200   | Snapshot created by cron
  429 | pre       |    10.80MiB | 2021-09-15 17:37:35 +0200   | Automatic pre-update snapshot
  430 | post      |    10.81MiB | 2021-09-15 17:37:42 +0200   | Automatic post-update snapshot
  **431** | pre       |    10.83MiB | 2021-09-18 19:18:20 +0200   | Automatic pre-update snapshot
  432 | post      |    12.68MiB | 2021-09-18 19:26:15 +0200   | Automatic post-update snapshot (**TurrisOS 5.2.7**)

You can mount this snapshot

root@turris:~# schnapps mount 431
Snapshot 431 mounted in /mnt/snapshot-@431

and compare files /etc/config/firewall and /mnt/snapshot-@431/etc/config/firewall.
Don’t forget to unmount mounted snapshot when you are finished:
umount /mnt/snapshot-@431

cynerd · September 23, 2021, 11:11am

Both of those buttons run update in the background. It is implemented in such a way that there is only one instance spawned at the time thus clicking it multiple times causes no issues.

einar · September 25, 2021, 12:56pm

Did two migrations so far:

One on a 2017 Turris Omnia: no problems whatsoever save SSL (as mentioned elsewhere in this thread)
One on an Indiegogo (second shipment) Turris Omnia

The latter went horribly wrong. At reboot somehow the network configuration was scrambled, because eth2 (WAN) had no link and so couldn’t connect to the Internet. But somehow the whole post-install configuration went awry, as syslog-ng wasn’t functional and thus there were no logs. Running pkgupdate (after attaching a USB 4G modem to get some connectivity) it reported a dependency problem with zip: removing it just changed a handful of packages and did nothing else.

Does OpenWRT have a way to fiddle with the physical interfaces? My hunch is that somehow the switch configuration was incorrectly made and thus the wrong ports were bridged together.

After about 30 minutes of fiddling I gave up and did a factory reset with the TurrisOS 5.2 medkit and restored the necessary configuration from a backup. After the reset, everything went smoothly.

I can’t fathom what could have gone wrong during the update. I might set up remote logging with syslog-ng to avoid future issues (or just attach some storage for persistent logs).

Comodore125 · September 26, 2021, 6:42pm

I have just finished IG Turris Omnia migration.
It did require more clicks in foris to get it running and then in reforis (that I had to fix via Can not use Reforis on TOS 5+ - ControllerMissing - #17 by encacz)

HDD with Omnia still works fine.
PPPoE - no issues.
VLANs migrated, but with one issue. Interfaces that were visible in luci did actually not work and had to be SET by set

 set network."VLAN_NAME".ifname='lan"0 or 1,2,3,4,5"."VLANID"'

e.g. set network.management.ifname='lan0.10'

So these were three major issues.
One small issue is that automatic updates did not do all the work and had to be triggered more times in order for the process to do someting.
I have also received no notifications during the process (I had them enabled)
After manual update clicking on manualy fixed reforis I got first notification about restart - and that was all.

EDIT: Third issue - collectd is still not working. Same problems as in v15.05 of openwrt fork. I have been able to make it work in the past, but updater damaged changes and made it broken again.
So I hope this package will get fixed soon. It is time

edavid · September 26, 2021, 7:56pm

where did you have to type this and for what use ? *
Is set a shell command on TOS 5 ?

Comodore125 · September 26, 2021, 7:58pm

typed it inside ssh session with omnia. just a normal command.
yep it is some piece of CLI thats basic dependency inside TOS5

USE: as stated, in order to make VLANs working after migration

edavid · September 26, 2021, 8:00pm

Ok, and where is the doc for this ?
I have more and more re

Comodore125 · September 26, 2021, 8:01pm

I did not notice any sensible documentation. I have just reversed the command from LUCI