Problems creating RAID1 and Filesystem

Hi guys,
i’ve problems setting up a RAID1 with mdadm.
At first i created the array:

mdadm --create --verbose --level=1 --metadata=1.2 --raid-devices=2 /dev/md0 /dev/sda /dev/sdb

After some the the resync process ends up in a faulty sdb. Im using 2 new WD Red 2TB. Ive even replaced the HDD with other WD Red 2TB. The Problem still persists.

But taking 2 old Seagate Barracuda 1TB are working fine as RAID1 and syncing perfectly.
SMART was fine for alle HDDs.

dmesg for array creation of 2 WD Red 2TB:

[ 8099.473600] md: bind
[ 8099.473741] md: bind
[ 8099.474003] md/raid1:md0: not clean – starting background reconstruction
[ 8099.474009] md/raid1:md0: active with 2 out of 2 mirrors
[ 8099.474060] md0: detected capacity change from 0 to 2000264560640
[ 8099.476993] md: resync of RAID array md0
[ 8099.477003] md: minimum guaranteed speed: 1000 KB/sec/disk.
[ 8099.477007] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[ 8099.477012] md: using 128k window, over a total of 1953383360k.
[ 8150.080754] ata2.00: exception Emask 0x0 SAct 0x1e0 SErr 0x0 action 0x6 frozen
[ 8150.088011] ata2.00: cmd 61/80:28:00:7a:5e/00:00:00:00:00/40 tag 5 ncq 65536 out
[ 8150.088011] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 8150.102857] ata2.00: cmd 61/00:30:80:7a:5e/05:00:00:00:00/40 tag 6 ncq 655360 out
[ 8150.102857] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 8150.117778] ata2.00: cmd 61/00:38:80:7f:5e/07:00:00:00:00/40 tag 7 ncq 917504 out
[ 8150.117778] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 8150.132700] ata2.00: cmd 61/00:40:80:86:5e/03:00:00:00:00/40 tag 8 ncq 393216 out
[ 8150.132700] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 8150.147619] ata2: hard resetting link
[ 8150.690745] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 8155.690738] ata2.00: qc timeout (cmd 0xec)
[ 8155.690756] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 8155.690761] ata2.00: revalidation failed (errno=-5)
[ 8155.695656] ata2: hard resetting link
[ 8165.700736] ata2: softreset failed (1st FIS failed)
[ 8165.705635] ata2: hard resetting link
[ 8175.710739] ata2: softreset failed (1st FIS failed)
[ 8175.715653] ata2: hard resetting link
[ 8188.975249] turris-00000000: IN=eth1 OUT= MAC=d8:58:d7:00:44:3b:08:96:d7:40:6d:96:08:00 SRC=108.160.172.193 DST=192.168.178.51 LEN=40 TOS=0x00 PREC=0x00 TTL=109 ID=19790 DF PROTO=TCP SPT=443 DPT=49278 WINDOW=0 RES=0x00 RST URGP=0
[ 8210.720757] ata2: softreset failed (1st FIS failed)
[ 8210.725653] ata2: limiting SATA link speed to 3.0 Gbps
[ 8210.725659] ata2: hard resetting link
[ 8215.930746] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 8225.930739] ata2.00: qc timeout (cmd 0xec)
[ 8225.930756] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 8225.930761] ata2.00: revalidation failed (errno=-5)
[ 8225.935655] ata2: limiting SATA link speed to 1.5 Gbps
[ 8225.935663] ata2: hard resetting link
[ 8235.940735] ata2: softreset failed (1st FIS failed)
[ 8235.945633] ata2: hard resetting link
[ 8245.950735] ata2: softreset failed (1st FIS failed)
[ 8245.955633] ata2: hard resetting link
[ 8280.960738] ata2: softreset failed (1st FIS failed)
[ 8280.965657] ata2: hard resetting link
[ 8286.170743] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 8316.170740] ata2.00: qc timeout (cmd 0xec)
[ 8316.170756] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 8316.170761] ata2.00: revalidation failed (errno=-5)
[ 8316.175692] ata2.00: disabled
[ 8316.175705] ata2.00: device reported invalid CHS sector 0
[ 8316.175711] ata2.00: device reported invalid CHS sector 0
[ 8316.175715] ata2.00: device reported invalid CHS sector 0
[ 8316.175718] ata2.00: device reported invalid CHS sector 0
[ 8316.175731] ata2: hard resetting link
[ 8326.180735] ata2: softreset failed (1st FIS failed)
[ 8326.185669] ata2: hard resetting link
[ 8336.190734] ata2: softreset failed (1st FIS failed)
[ 8336.195632] ata2: hard resetting link
[ 8371.200735] ata2: softreset failed (1st FIS failed)
[ 8371.205636] ata2: hard resetting link
[ 8376.410742] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 8376.410774] ata2: EH complete
[ 8376.410812] sd 1:0:0:0: [sdb] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 8376.410822] sd 1:0:0:0: [sdb] tag#9 CDB: opcode=0x2a 2a 00 00 5e 86 80 00 03 00 00
[ 8376.410828] blk_update_request: I/O error, dev sdb, sector 6194816
[ 8376.417063] sd 1:0:0:0: [sdb] tag#10 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 8376.417070] sd 1:0:0:0: [sdb] tag#10 CDB: opcode=0x2a 2a 00 00 5e 7f 80 00 07 00 00
[ 8376.417074] blk_update_request: I/O error, dev sdb, sector 6193024
[ 8376.423309] md/raid1:md0: Disk failure on sdb, disabling device.
[ 8376.423309] md/raid1:md0: Operation continuing on 1 devices.
[ 8376.423336] sd 1:0:0:0: [sdb] tag#11 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 8376.423341] sd 1:0:0:0: [sdb] tag#11 CDB: opcode=0x2a 2a 00 00 5e 7a 80 00 05 00 00
[ 8376.423343] blk_update_request: I/O error, dev sdb, sector 6191744
[ 8376.423364] sd 1:0:0:0: [sdb] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 8376.423368] sd 1:0:0:0: [sdb] tag#12 CDB: opcode=0x2a 2a 00 00 5e 7a 00 00 00 80 00
[ 8376.423370] blk_update_request: I/O error, dev sdb, sector 6191616
[ 8376.447474] md: md0: resync interrupted.
[ 8437.120755] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 8437.127849] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 20
[ 8437.127849] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 8437.141566] ata1: hard resetting link
[ 8447.150737] ata1: softreset failed (1st FIS failed)
[ 8447.155640] ata1: hard resetting link
[ 8457.160747] ata1: softreset failed (1st FIS failed)
[ 8457.165648] ata1: hard resetting link
[ 8492.170738] ata1: softreset failed (1st FIS failed)
[ 8492.175654] ata1: limiting SATA link speed to 3.0 Gbps
[ 8492.175660] ata1: hard resetting link
[ 8497.170776] ata1: softreset failed (1st FIS failed)
[ 8497.175915] ata1: reset failed, giving up
[ 8497.180182] ata1.00: disabled
[ 8497.180192] ata1.00: device reported invalid CHS sector 0
[ 8497.180211] ata1: EH complete
[ 8497.180248] blk_update_request: I/O error, dev sda, sector 8
[ 8497.185921] md: super_written gets error=-5
[ 8497.186397] sd 0:0:0:0: [sda] tag#22 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 8497.186410] sd 0:0:0:0: [sda] tag#22 CDB: opcode=0x28 28 00 00 04 00 00 00 00 08 00
[ 8497.186415] blk_update_request: I/O error, dev sda, sector 262144
[ 8497.186632] md: checkpointing resync of md0.
[ 8497.192550] Buffer I/O error on dev md0, logical block 0, async page read
[ 8497.193159] blk_update_request: I/O error, dev sda, sector 8
[ 8497.193161] md: super_written gets error=-5
[ 8497.193264] blk_update_request: I/O error, dev sda, sector 8
[ 8497.193266] md: super_written gets error=-5
[ 8497.193291] RAID1 conf printout:
[ 8497.193292] — wd:1 rd:2
[ 8497.193294] disk 0, wo:0, o:1, dev:sda
[ 8497.193296] disk 1, wo:1, o:0, dev:sdb
[ 8497.211321] sd 0:0:0:0: [sda] tag#25 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 8497.211332] sd 0:0:0:0: [sda] tag#25 CDB: opcode=0x28 28 00 00 04 00 00 00 00 08 00
[ 8497.211337] blk_update_request: I/O error, dev sda, sector 262144
[ 8497.217450] Buffer I/O error on dev md0, logical block 0, async page read
[ 8497.225637] md0: unable to read partition table
[ 8497.226859] sd 0:0:0:0: [sda] tag#26 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 8497.226871] sd 0:0:0:0: [sda] tag#26 CDB: opcode=0x28 28 00 00 00 00 08 00 00 08 00
[ 8497.226876] blk_update_request: I/O error, dev sda, sector 8
[ 8497.230783] RAID1 conf printout:
[ 8497.230785] — wd:1 rd:2
[ 8497.230788] disk 0, wo:0, o:1, dev:sda
[ 8497.230850] blk_update_request: I/O error, dev sda, sector 8
[ 8497.230851] md: super_written gets error=-5
[ 8497.230976] blk_update_request: I/O error, dev sda, sector 8
[ 8497.230977] md: super_written gets error=-5
[ 8497.231033] blk_update_request: I/O error, dev sda, sector 8
[ 8497.231035] md: super_written gets error=-5
[ 8497.231092] md: resync of RAID array md0
[ 8497.231094] md: minimum guaranteed speed: 1000 KB/sec/disk.
[ 8497.231095] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[ 8497.231098] md: using 128k window, over a total of 1953383360k.
[ 8497.231099] md: resuming resync of md0 from checkpoint.
[ 8497.231338] md: md0: resync done.
[ 8497.231587] blk_update_request: I/O error, dev sda, sector 8
[ 8497.231589] md: super_written gets error=-5
[ 8497.231641] md: super_written gets error=-5

Any idea what to do or whats wrong?

Hi, had the same problem with WD RED 3 TB and mdadm. Resolved with btrfs, as discussed here: SATA HDD issues. Drives seem to work fine, have no idea why RED’s cause these errors…

1 Like

Thanks for your reply. Will try that. Cheers!

Anyone got experience with different drive capacity or manufacturer? Most of us was trying WD REDs, someone with HGST, Seagate or Toshiba NAS disks ?

Anyone tested SATA adapters with different chipsets than ASMedia? (sil3132, marwell, …) Or with ASMedia chipset, but different manufacturer?

I was planning to run raid 1 with two 4 or 6 TB disks, but do not want to spent load of money for HDDs and find out it is not working.

IMHO it is the most likely problem in adapter.

Can we please get an oficial feedback from Turris Team on this issue? There are at least 2 other threads with this issue and no official information.

2 Likes

One theory on IRC is that there isn’t enough power for two 3.5" HDDs

That is a good theory. I have two other theories:

  1. There is no enough bandwidth for two highspeed SATA connections at the same time and the first connection starves the second one sooner or later.

If that is the case one can try if “libata.force=1.5” or “libata.force=noncq” as a kernel boot parameter helps.

  1. Overheating of NAS box. The temperate rises quite high without a fan and can cause for example drives to do frequent recalibrations.

More on this: miniPCIe has max speed of 2.5 Gbps and two SATA1 interfaces are 3 Gbps, two SATA2 are 6 Gbps, and two SATA3 are 12 Gbps. All of those are larger than what miniPCIe can provide.

The SATA card in NAS perk is actually a RAID card so it can provide enough bandwidth for two SATA2 or two SATA3 drives if it is used as a such but if you implement RAID in software there is no enough bandwidth for two SATA2 or two SATA3 connections.

I haven’t seen RAID hardware driver for the SATA card so if there is no driver for it then the maximum speed is 2.5 Gbps for one drive or 1.25 Gbps for two drives.

I don’t think this causes the problem. I have tried it with the powerful fan, where the box is quite cool, and anyway this problem appeared.

Power isn’t problem. Power supply, if marked correctly, should provide 40W. Board and antennas is under 8W (it was somewhere on this forum) and one Wd red should be under 4W.
I also tried to power discs by separate atx power supply and the problem remained.

Overheating most probability isn’t the problem eather. I have fan connected and also tried to hold hand on sata card chips and those didn’t even get warm enough to be uncomfortable and array failed.

Mdadm with two old wd velociraptors built ok.
Wd red + seagate nas both 4TB failed. Array with those disks build in pc than put in turris reassembled ok. Copied about 1Tb of data to it over network and it didn’t failed. Now it runs 3 days from last reboot without any problem althoughI only copied some data and watched movies over dlna.

More on raid problems in this thread SATA HDD issues.

Could it be that the hardware briefly exceeds the 40W when it boots, e.g. to get the disks spinning (or something else, I don’t know), and that this leads to one ore both disks not being recognized? Unfortunately, I don’t have the tools to measure consumption

You should not expect any problem with capacity or different brand of drives or power source.
There is something what you should worry about what prevents using Turris Omnia as NAS in raid setup like we know from different brand models.
Problem is with PCI Express bus which is not reliable and disconnects your drives during initial raid sync or later during high load like btrfs scrubbing or when something in lxc container wants eat maximum hdd bandwidth.
I think there is something wrong in hardware design because I have not found any info how they want resolve this problem even they confirmed they know about it.

I had HDD stop responding even with external power supply.

I had HDD stop responding even with just one disk connected and it doing nothing.

I had HDD stop responding with it at room temperature.

My current working hypothesis is that the controller is shit. I’m waiting for a new one using a different chipset to see how it goes.

1 Like

Probably related to https://forum.turris.cz/t/sata-hdd-issues/1173