BTRFS scrub uncorrectable errors on /

kukulin · September 18, 2017, 10:39am

I’m running btrfs scrub weekly on all btrfs filesystems including root on omnia.
Now I start to have uncorrectable errors on / :

root@turris:~# btrfs scrub status /                                                                                                           
scrub status for e5a12292-b46b-4343-aea7-29f3b83a5986                                                                                         
        scrub started at Mon Sep 18 17:24:21 2017 and finished after 00:00:15                                                                 
        total bytes scrubbed: 579.09MiB with 86 errors                                                                                        
        error details: csum=86                                                                                                                
        corrected errors: 0, uncorrectable errors: 86, unverified errors: 0

Is there any way how to correct these errors?
I’m afraid it will soon finish in read only and system will not be usable any more
Thank you

kukulin · September 18, 2017, 11:17pm

on top of that I see these errors kernel 2017-09-18 17:24:28.000 [100035.598729] BTRFS error (device mmcblk0p1): bdev /dev/mmcblk0p1 errs: wr 0, rd 0, flush 0, corrupt 172, gen 0 kernel 2017-09-18 17:24:28.000 [100035.533183] BTRFS error (device mmcblk0p1): unable to fixup (regular) error at logical 24756224 on dev /dev/mmcblk0p1

I mounted previous copy by schnapps and run btrfs scrub with same result.

Is there any simple way how to fix this?

vcunat · September 19, 2017, 9:51am

Closely related thread, but not too useful (yet): Updater selhal: Failed to provide the approval report: Read-only file system [3.7.1]

kukulin · September 19, 2017, 1:19pm

Thank you, it gives me direction, BTW I think this kind of btrfs error is happening after power shortage (at least in my case I have often electricity black out. Have to put in place UPS).

One more question - I fixed such issues on my msata and USB NAS by creating mirror (msata split to 2 partitions same size and create btrfs raid1, NAS 4 disks on USB and btrfs raid 10).
This fixed issue and scrub has always good copy of data/metadata to reconstruct filesystem.
Is there any possibility to split mmcblk0p1 into 2 same parts and use btrfs raid1 ? With same functionality of schnapps etc? This could be useful and probably avoid such errors.
Again thanks for trying to help me.

vcunat · September 19, 2017, 1:33pm

This makes me wonder if the integrated storage implements write barriers correctly. (or whatever is actually used in similar HW to achieve ACID-like properties)

vcunat · September 19, 2017, 1:35pm

You don’t even need two devices to have duplication on BTRFS. It actually does duplicate metadata by default on rotating drives. See “dup profiles” in man mkfs.btrfs.

Weafyr · September 19, 2017, 1:41pm

The power shortage is something that i have to do for several times every time when the update come (to get my 5Ghz wifi back). Maybe together with LXC containers something went wrong and damaged FS in my case. Thanks for the point.

kukulin · September 19, 2017, 11:21pm

This is interesting, so after check of / on my omnia profile is single:

    root@turris:~# btrfs fi df /
    System, single: total=32.00MiB, used=4.00KiB
    Data+Metadata, single: total=1.48GiB, used=578.53MiB
    GlobalReserve, single: total=12.00MiB, used=0.00B

Is it safe to use balance and change profile to DUP?
Will this help or Omnia will deduplicate duplication done by btrfs?

vcunat · September 20, 2017, 7:11am

I don’t think your / is on rotating drive, so that’s consistent with btrfs docs.

A single device filesystem will default to DUP, unless a SSD is detected. Then it will default to single.

I don’t expect the storage device is so smart to deduplicate this, but I know almost nothing about it.

mazhead · September 24, 2017, 5:49pm

Had issues with btrfs on / and I traced the error to lxc container files.
I think a power outage and lxc could be the root cause of errors. Will probably move my containers to external drive to try to prevent this.

Personally I had only bad experience with btrfs… The features are great but the stability not so much.

kukulin · September 26, 2017, 4:13am

I have LXC containers on Msata with BTRFS and it failed several times with same issue - uncorrectable errors. Issue disappear (at least for last 2 month) when I split it to 2 partitions and make raid 1(mirror).
There are sometimes error but corrected by scrub.
Biggest mistake i did was at beginning when I put LXC containers on external USB disk with btrfs - I got crash every week.

But I have btrfs 10 raid on 4 usb connected drives as NAS and there is no issue at all - I’m streaming movies to 2 TVs, use it as backup for several RPI’s and computers. Have here storage mounted into LXC with next cloud… and all seems to work ok.

kukulin · September 26, 2017, 4:15am

Thank you for advice and help,
I’ll try it as soon as time allow me to reflash omnia with medkit (or system completely crash).