Hi,
This is a long post; I’m looking for ideas, brainstorming, discussion, experience. Admins: I may have selected bad subforum, so feel free to move it.
I have Turris Omnia with attached USB3.0 disk (new WD MyBook, 4TB), formatted to ext4, which is being shared through Samba, Syncthing and SSHFS. Two days after an update we noticed the disk is not accessible and SSHing to the router revealed the disk was completely empty, having lost around 200GB of data. Not only the data disappeared, but also the lost+found
directory, which is automatically created on filesystem format, wasn’t there. The disk appeared as if you ran rm -rf
on it; except that the journal was also empty.
Omnia is being updated regularly and automatically, with its factory settings. I only receive e-mails about updates and planned maintenance reboots. Last update occurred on the 10th of May and the device rebooted at 3:30 on the 13th of May, which was this Sunday. On Tuesday we discovered the failure.
Since last evening I’m running ext4magic
to recover the lost data. It appears the data is still there, so I might be able to recover it without major loss. The ext4magic
tool revealed the journal to be empty. I haven’t run smartctl
yet, because data recovery has priority. dmesg
did not reveal any errors related to the disk, but I admit that wasn’t the first thing I ran. As I didn’t know what was happening, the first thing I ran was fsck
, which may have “repaired” the filesystem, clearing the journal or restoring any filesystem control structures on the disk.
Now I’m looking for explanations to what might have happened. Of course, if the disk failed, I want to have it replaced as it’s about 4 months old. If it’s Omnia’s fault, then this should be explored and fixed. However, I can’t currently wrap my head around this. It doesn’t make any sense to me. If the disk failed, I’d expect some bad sectors, error messages in dmesg
, or partial data inaccessibility. I don’t expect the described behaviour to be caused by HW failure. Similarly, I don’t see a way how an update would cause this, or how any unintended SW action would result in this. If something, maybe a bug in Syncthing, deleted the data in a normal manner, the journal would describe that, wouldn’t it? Even then, my daemons run under unprivileged users, not root. The data belonged to multiple users, including root (as in the case of lost+found
folder). Does anybody have any idea? Anything else I should look into?
Fortunately, the disk is mostly a backup, not really storage, so no real damage except for the stress and time loss.