RAID - what am I doing wrong?

I have two Turris devices: mine and a friends.

With mine, I attached two HDDs formatted ext4 to the Turris, then mounted sda1. As my other posting suggested, I don’t think I did this correctly.

As such, decided to follow this posting.

Before I started the instructions, I mounted each HDD and reformatted them ext4. Next I wrote a file to each to ensure that they were RW. I removed the files, unmounted the HDDs and rebooted.

Once the device was available I did some preliminary checks:

~# lsblk
NAME         MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda            8:0    0   1.8T  0 disk  
sdb            8:16   0   1.8T  0 disk  
mtdblock0     31:0    0     1M  0 disk  
mtdblock1     31:1    0     7M  0 disk  
mmcblk0      179:0    0   7.3G  0 disk  
`-mmcblk0p1  179:1    0   7.3G  0 part  /
mmcblk0boot0 179:8    0     4M  1 disk  
mmcblk0boot1 179:16   0     4M  1 disk  
mmcblk0rpmb  179:24   0     4M  0 disk  

I created the RAID1 device:

mdadm --create --verbose /dev/md0 --level=mirror --raid-devices=2 /dev/sda /dev/sdb

I checked its status:

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] 
md0 : active raid1 sdb[1] sda[0]
      1953383360 blocks super 1.2 [2/2] [UU]
      [>....................]  resync =  0.0% (823488/1953383360) finish=276.6min speed=117641K/sec

checked the blocks:

# lb
NAME           SIZE MOUNTPOINT  STATE   FSTYPE MODEL      SERIAL     UUID                                 LABEL
sda            1.8T             running        ST32000542                                                 
`-md0          1.8T                                                                                       
sdb            1.8T             running        ST32000542                                                 
`-md0          1.8T                                                                                       
mtdblock0        1M                                                                                       
mtdblock1        7M                                                                                       
mmcblk0        7.3G                                       0x0ea2ec8d                                      
`-mmcblk0p1    7.3G /                   btrfs                        9e857b64-9b5e-4a3f-8433-ccf6f4ff47fb 
mmcblk0boot0     4M                                                                                       
mmcblk0boot1     4M                                                                                       
mmcblk0rpmb      4M                           

Some time later, I returned to check the status of mdadm:

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] 
md0 : active raid1 sdb[1](F) sda[0]
      1953383360 blocks super 1.2 [2/1] [U_]

details:

# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Sat Jul 22 17:49:16 2017
     Raid Level : raid1
     Array Size : 1953383360 (1862.89 GiB 2000.26 GB)
  Used Dev Size : 1953383360 (1862.89 GiB 2000.26 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Sat Jul 22 17:55:52 2017
          State : clean, degraded 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1       0        0        1      removed

       1       8       16        -      faulty spare   /dev/sdb

dmesg has logged a number of errors:

[  383.127878] md: bind<sda>
[  383.137854] md: bind<sdb>
[  383.138175] md/raid1:md0: not clean -- starting background reconstruction
[  383.138181] md/raid1:md0: active with 2 out of 2 mirrors
[  383.138232] md0: detected capacity change from 0 to 2000264560640
[  383.138316] md: resync of RAID array md0
[  383.138322] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  383.138325] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[  383.138330] md: using 128k window, over a total of 1953383360k.
[  493.027523] ata2.00: exception Emask 0x0 SAct 0x700 SErr 0x0 action 0x6 frozen
[  493.034778] ata2.00: cmd 61/80:40:00:94:16/01:00:01:00:00/40 tag 8 ncq 196608 out
[  493.034778]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  493.049709] ata2.00: cmd 61/80:48:80:95:16/05:00:01:00:00/40 tag 9 ncq 720896 out
[  493.049709]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  493.064632] ata2.00: cmd 61/00:50:00:9b:16/02:00:01:00:00/40 tag 10 ncq 262144 out
[  493.064632]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  493.079642] ata2: hard resetting link
[  493.079700] ata1.00: exception Emask 0x0 SAct 0xc0 SErr 0x0 action 0x6 frozen
[  493.086862] ata1.00: cmd 60/00:30:00:9d:16/05:00:01:00:00/40 tag 6 ncq 655360 in
[  493.086862]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  493.101703] ata1.00: cmd 60/80:38:00:a2:16/01:00:01:00:00/40 tag 7 ncq 196608 in
[  493.101703]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  493.116531] ata1: hard resetting link
[  503.077117] ata2: softreset failed (1st FIS failed)
[  503.082016] ata2: hard resetting link
[  503.117126] ata1: softreset failed (1st FIS failed)
[  503.122020] ata1: hard resetting link
[  513.076723] ata2: softreset failed (1st FIS failed)
[  513.081617] ata2: hard resetting link
[  513.116717] ata1: softreset failed (1st FIS failed)
[  513.121607] ata1: hard resetting link
[  548.074836] ata2: softreset failed (1st FIS failed)
[  548.079731] ata2: limiting SATA link speed to 1.5 Gbps
[  548.079736] ata2: hard resetting link
[  548.114903] ata1: softreset failed (1st FIS failed)
[  548.119809] ata1: limiting SATA link speed to 1.5 Gbps
[  548.119816] ata1: hard resetting link
[  553.074580] ata2: softreset failed (1st FIS failed)
[  553.079476] ata2: reset failed, giving up
[  553.083493] ata2.00: disabled
[  553.083501] ata2.00: device reported invalid CHS sector 0
[  553.083505] ata2.00: device reported invalid CHS sector 0
[  553.083508] ata2.00: device reported invalid CHS sector 0
[  553.083525] ata2: EH complete
[  553.083565] sd 1:0:0:0: [sdb] tag#11 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  553.083574] sd 1:0:0:0: [sdb] tag#11 CDB: opcode=0x2a 2a 00 01 16 9b 00 00 02 00 00
[  553.083579] blk_update_request: I/O error, dev sdb, sector 18258688
[  553.089910] sd 1:0:0:0: [sdb] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  553.089917] sd 1:0:0:0: [sdb] tag#12 CDB: opcode=0x2a 2a 00 01 16 95 80 00 05 80 00
[  553.089924] blk_update_request: I/O error, dev sdb, sector 18257280
[  553.090595] md/raid1:md0: Disk failure on sdb, disabling device.
[  553.090595] md/raid1:md0: Operation continuing on 1 devices.
[  553.090660] md: md0: resync interrupted.
[  553.107928] sd 1:0:0:0: [sdb] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  553.107935] sd 1:0:0:0: [sdb] tag#13 CDB: opcode=0x2a 2a 00 01 16 94 00 00 01 80 00
[  553.107939] blk_update_request: I/O error, dev sdb, sector 18256896
[  553.344570] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  558.344321] ata1.00: qc timeout (cmd 0xec)
[  558.344335] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  558.344340] ata1.00: revalidation failed (errno=-5)
[  558.349232] ata1: hard resetting link
[  568.353829] ata1: softreset failed (1st FIS failed)
[  568.358726] ata1: hard resetting link
[  578.363359] ata1: softreset failed (1st FIS failed)
[  578.368258] ata1: hard resetting link
[  613.371837] ata1: softreset failed (1st FIS failed)
[  613.376741] ata1: hard resetting link
[  618.581627] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  628.581219] ata1.00: qc timeout (cmd 0xec)
[  628.581235] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  628.581239] ata1.00: revalidation failed (errno=-5)
[  628.586149] ata1: hard resetting link
[  638.590824] ata1: softreset failed (1st FIS failed)
[  638.595769] ata1: hard resetting link
[  648.600340] ata1: softreset failed (1st FIS failed)
[  648.605234] ata1: hard resetting link
[  683.607930] ata1: softreset failed (1st FIS failed)
[  683.612866] ata1: hard resetting link
[  688.817603] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  718.815790] ata1.00: qc timeout (cmd 0xec)
[  718.815805] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  718.815812] ata1.00: revalidation failed (errno=-5)
[  718.820718] ata1.00: disabled
[  718.820729] ata1.00: device reported invalid CHS sector 0
[  718.820733] ata1.00: device reported invalid CHS sector 0
[  718.820745] ata1: hard resetting link
[  728.825222] ata1: softreset failed (1st FIS failed)
[  728.830118] ata1: hard resetting link
[  738.834666] ata1: softreset failed (1st FIS failed)
[  738.839560] ata1: hard resetting link
[  773.832878] ata1: softreset failed (1st FIS failed)
[  773.838038] ata1: hard resetting link
[  779.042620] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  779.042651] ata1: EH complete
[  779.042684] sd 0:0:0:0: [sda] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  779.042686] sd 0:0:0:0: [sda] tag#11 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  779.042692] sd 0:0:0:0: [sda] tag#11 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00
[  779.042699] blk_update_request: I/O error, dev sda, sector 8
[  779.042701] sd 0:0:0:0: [sda] tag#9 CDB: opcode=0x28 28 00 01 16 a2 00 00 01 80 00
[  779.042705] blk_update_request: I/O error, dev sda, sector 18260480
[  779.054653] md: super_written gets error=-5
[  779.054699] sd 0:0:0:0: [sda] tag#10 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  779.054706] sd 0:0:0:0: [sda] tag#10 CDB: opcode=0x28 28 00 01 16 9d 00 00 05 00 00
[  779.054710] blk_update_request: I/O error, dev sda, sector 18259200
[  779.055003] sd 0:0:0:0: [sda] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  779.055008] sd 0:0:0:0: [sda] tag#12 CDB: opcode=0x28 28 00 01 16 a2 00 00 00 08 00
[  779.055011] blk_update_request: I/O error, dev sda, sector 18260480
[  779.055045] sd 1:0:0:0: [sdb] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  779.055049] sd 1:0:0:0: [sdb] tag#14 CDB: opcode=0x28 28 00 01 16 a2 00 00 00 08 00
[  779.055051] blk_update_request: I/O error, dev sdb, sector 18260480
[  779.055062] md/raid1:md0: sda: unrecoverable I/O read error for block 17998336
[  779.055093] sd 0:0:0:0: [sda] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  779.055097] sd 0:0:0:0: [sda] tag#13 CDB: opcode=0x28 28 00 01 16 a2 80 00 00 08 00
[  779.055099] blk_update_request: I/O error, dev sda, sector 18260608
[  779.055123] sd 1:0:0:0: [sdb] tag#15 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  779.055127] sd 1:0:0:0: [sdb] tag#15 CDB: opcode=0x28 28 00 01 16 a2 80 00 00 08 00
[  779.055129] blk_update_request: I/O error, dev sdb, sector 18260608
[  779.055137] md/raid1:md0: sda: unrecoverable I/O read error for block 17998464
[  779.055170] sd 0:0:0:0: [sda] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  779.055174] sd 0:0:0:0: [sda] tag#14 CDB: opcode=0x28 28 00 01 16 a3 00 00 00 08 00
[  779.055176] blk_update_request: I/O error, dev sda, sector 18260736
[  779.055199] sd 1:0:0:0: [sdb] tag#16 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  779.055203] sd 1:0:0:0: [sdb] tag#16 CDB: opcode=0x28 28 00 01 16 a3 00 00 00 08 00
[  779.055205] blk_update_request: I/O error, dev sdb, sector 18260736
[  779.055212] md/raid1:md0: sda: unrecoverable I/O read error for block 17998592
[  779.120448] sd 0:0:0:0: [sda] tag#15 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[  779.120458] sd 0:0:0:0: [sda] tag#15 CDB: opcode=0x28 28 00 01 16 9d 00 00 00 08 00
[  779.120462] blk_update_request: I/O error, dev sda, sector 18259200
[  779.127025] md/raid1:md0: sda: unrecoverable I/O read error for block 17997056
[  779.134377] md/raid1:md0: sda: unrecoverable I/O read error for block 17997184
[  779.141692] md/raid1:md0: sda: unrecoverable I/O read error for block 17997312
[  779.149155] md/raid1:md0: sda: unrecoverable I/O read error for block 17997440
[  779.156506] md/raid1:md0: sda: unrecoverable I/O read error for block 17997568
[  779.163818] md/raid1:md0: sda: unrecoverable I/O read error for block 17997696
[  779.171134] md/raid1:md0: sda: unrecoverable I/O read error for block 17997824
[  779.178443] md/raid1:md0: sda: unrecoverable I/O read error for block 17997952
[  779.185840] md/raid1:md0: sda: unrecoverable I/O read error for block 17998080
[  779.193173] md/raid1:md0: sda: unrecoverable I/O read error for block 17998208
[  779.200711] md: checkpointing resync of md0.
[  779.200750] md: super_written gets error=-5
[  779.200840] md: super_written gets error=-5
[  779.200876] RAID1 conf printout:
[  779.200881]  --- wd:1 rd:2
[  779.200885]  disk 0, wo:0, o:1, dev:sda
[  779.200888]  disk 1, wo:1, o:0, dev:sdb
[  779.222612] RAID1 conf printout:
[  779.222618]  --- wd:1 rd:2
[  779.222622]  disk 0, wo:0, o:1, dev:sda
[  779.222690] md: super_written gets error=-5
[  779.222748] md: super_written gets error=-5
[  779.222795] md: super_written gets error=-5
[  779.222854] md: resync of RAID array md0
[  779.222858] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  779.222861] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[  779.222866] md: using 128k window, over a total of 1953383360k.
[  779.223057] md: md0: resync done.
[  779.223275] md: super_written gets error=-5
[  779.223323] md: super_written gets error=-5
[ 1205.064107] scsi_io_completion: 26 callbacks suppressed
[ 1205.064121] sd 0:0:0:0: [sda] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 1205.064129] sd 0:0:0:0: [sda] tag#1 CDB: opcode=0x28 28 00 e8 e0 88 00 00 00 08 00
[ 1205.064133] blk_update_request: 26 callbacks suppressed
[ 1205.064137] blk_update_request: I/O error, dev sda, sector 3907028992
[ 1205.070630] sd 0:0:0:0: [sda] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 1205.070636] sd 0:0:0:0: [sda] tag#2 CDB: opcode=0x28 28 00 e8 e0 88 00 00 00 08 00
[ 1205.070640] blk_update_request: I/O error, dev sda, sector 3907028992
[ 1205.077123] Buffer I/O error on dev sda, logical block 488378624, async page read
[ 1205.085128] sd 0:0:0:0: [sda] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 1205.085137] sd 0:0:0:0: [sda] tag#3 CDB: opcode=0x28 28 00 e8 e0 87 00 00 00 08 00
[ 1205.085142] blk_update_request: I/O error, dev sda, sector 3907028736
[ 1205.091651] sd 0:0:0:0: [sda] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 1205.091658] sd 0:0:0:0: [sda] tag#5 CDB: opcode=0x28 28 00 e8 e0 87 00 00 00 08 00
[ 1205.091662] blk_update_request: I/O error, dev sda, sector 3907028736
[ 1205.098136] Buffer I/O error on dev md0, logical block 488345824, async page read
[ 1205.106155] sd 1:0:0:0: [sdb] tag#27 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 1205.106164] sd 1:0:0:0: [sdb] tag#27 CDB: opcode=0x28 28 00 e8 e0 88 00 00 00 08 00
[ 1205.106169] blk_update_request: I/O error, dev sdb, sector 3907028992
[ 1205.112759] sd 1:0:0:0: [sdb] tag#28 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 1205.112766] sd 1:0:0:0: [sdb] tag#28 CDB: opcode=0x28 28 00 e8 e0 88 00 00 00 08 00
[ 1205.112770] blk_update_request: I/O error, dev sdb, sector 3907028992
[ 1205.119228] Buffer I/O error on dev sdb, logical block 488378624, async page read
[ 1205.127186] sd 0:0:0:0: [sda] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 1205.127199] sd 0:0:0:0: [sda] tag#6 CDB: opcode=0x28 28 00 e8 e0 87 00 00 00 08 00
[ 1205.127206] blk_update_request: I/O error, dev sda, sector 3907028736
[ 1205.133738] sd 0:0:0:0: [sda] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 1205.133745] sd 0:0:0:0: [sda] tag#7 CDB: opcode=0x28 28 00 e8 e0 87 00 00 00 08 00
[ 1205.133749] blk_update_request: I/O error, dev sda, sector 3907028736
[ 1205.140227] Buffer I/O error on dev md0, logical block 488345824, async page read
[ 1828.017218] sd 0:0:0:0: [sda] tag#8 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 1828.017230] sd 0:0:0:0: [sda] tag#8 CDB: opcode=0x1b 1b 00 00 00 00 00
[ 1828.017298] sd 1:0:0:0: [sdb] tag#30 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 1828.017305] sd 1:0:0:0: [sdb] tag#30 CDB: opcode=0x1b 1b 00 00 00 00 00

I followed a similar process on my friend’s Turris, which resulted in a similar situation. I also ran badblocks -v on both of my friend’s HDDs, but no bad blocks were reported.

What am I doing wrong? I can’t believe that 2 sets of HDDs would fail like this.

Any assistance is greatly appreciated.

Hi,

Try this:

I made the changes permanent (I hope) by adding this to the /etc/rc.local file:

# RAID fix
echo 1 > /sys/block/sda/device/queue_depth
echo 1 > /sys/block/sdb/device/queue_depth
# /fix

# existing code
exit 0