Hi, I have two 8TB hdd in Turris Omnia NAS (I have Turris NAS kit). Each drive is divided into two 4TB partitions. On each drive first partition is BtrFS and second is ext4. It was meant to have RAID1 on first partitions.
Lately I am experiencing strange issue. Sdb is loosing UUIDs.
After some time when everything works as expected, sub suddenly looses UUID and can’t be accessed anymore. Sda is ok but sdb looses UUIDs and I need to restart Turris to be able use sub again.
It started after I “turned off” my Turris before longer vacation by disconnecting it out from electricity. After my arrival I turned it back on and the next day I found out that RAID1 is corrupted (at that time I had extč on all partitions). After some investigation I concluded that one partition failed somehow and was synced into to second partition so both partitions were corrupted and no data could be saved.
I am running lsblk -o NAME,LABEL,UUID each 5 minutes to find out when UUIDs are lost and this is the last occurrence (sda1 and sdb1 are BtrFS, sda2 and sdb2 are ext4, sdc is usb drive dedicated for lxc virtuals):
LSBLK: 10/14/20 03:15:01
NAME LABEL UUID
sda
|-sda1 SYNC 18eba303-049c-422b-b646-c3c133f614f1
`-sda2 NORAID-2 80c7074b-2f64-42f9-8ecb-2248d468ec47
sdb
|-sdb1 SYNC-BACKUP 69e5f8e4-ea72-4781-8861-bab44e7c4b8b
`-sdb2 NORAID-1 05714f0b-0895-481d-917e-121f81c93d32
sdc
`-sdc1 d43df750-b8cc-4ec8-92d9-106117c84785
mtdblock0
mtdblock1
mmcblk0
`-mmcblk0p1 04571fc5-265b-48c6-bfca-051f104323e3
mmcblk0boot0
mmcblk0boot1
mmcblk0rpmb
LSBLK: 10/14/20 03:20:01
NAME LABEL UUID
sda
|-sda1 SYNC 18eba303-049c-422b-b646-c3c133f614f1
`-sda2 NORAID-2 80c7074b-2f64-42f9-8ecb-2248d468ec47
sdb
|-sdb1
`-sdb2
sdc
`-sdc1 d43df750-b8cc-4ec8-92d9-106117c84785
mtdblock0
mtdblock1
mmcblk0
`-mmcblk0p1 04571fc5-265b-48c6-bfca-051f104323e3
mmcblk0boot0
mmcblk0boot1
mmcblk0rpmb
Does anybody experienced the same issue or know what should I test?
I tried factory reset on Turris, I changed both hdds (originally I had two 4TB now I have two 8TB), tried turn of all lxc containers, hdd-idle, removing mdadm raid1 and now syncing files by rsync command once per day, etc. Nothing helped.
Now I am thinking about hardware issue - maybe driver for sub is broken somehow. Does anybody know how if it is possible to test hardware driver?