SATA HDD issues

eyjohn · December 3, 2016, 11:17pm

I can confirm that I’m having this problem as well, with two 3TB WD Red disks (some one has already had reported this problem for this disk combo above).

Also, I’ve gone with btrfs for now but this has also run into the same problem during some heavy reading/writing.

As of now I have no stable way to use the NAS perk, can CZ.NIC please look into this issue.

blackdot · December 5, 2016, 8:55am

There is possibility for mdadm users to limit array build speed:
View rebuild speed limit:
sysctl dev.raid.speed_limit_max
Modify rebuild speed limit:
sysctl -w dev.raid.speed_limit_max=value

If it will be set to some reasonable value, maybe this will help to at least build it. This is not a solution… Only workaround…

Phatman81 · December 5, 2016, 8:47pm

I am having the same issue as the rest. In an attempt to try as many suggestions as possible, I am currently [for the moment at least] having success using the process outlined by floflooo in the link below.
Basically, I removed a member of my array [sdb, as that was showing as faulty] and then followed the process to add in a new disk. So far my recovery is at 3.x% [a threshold i never reached with the other methods].
I’ll update again as it either fails or completes.

I got up to about 10%, but then the device went into a cyclical reboot [watched the lights reboot 3x before…]. I powered it down, reseated the mSata adapter and booted back up. It loaded and resumed the recovery successfully… then failed as per the standard issue within the next 10 mins. sigh…

On a subsequent reboot, it mounted and began the recovery @11.x%. will keep watching it…

I have had to reboot 2x more times, but it is still progressing, currently ~20.x%… [another cyclical reboot around 60%, but it did complete and shows a healthy array as of this morning]

CIJOML · December 6, 2016, 7:26am

why use mdadm when you have btrfs with raid options directly within system??

https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices

Perry · December 6, 2016, 11:35am

Does anyone tried RAID 0 striped or JBOD?

Maxmilian_Picmaus · December 6, 2016, 7:02pm

small update … So repaired, (new piece) drive is back, just to be sure i tried preparing all on win10 , or just raw partitions for later formating on turris. Both ways ended in i/o error. When ready to mount, any second attempt to mount any partition ended with i/o error. When prepared for mkfs any second partition ended with short write and any other later on short read/write when finalizing step. ioctl just gives out an error. No matter what partition i create, what filesystem or/and size i choose (and of course if i prepare it directly or under win10 or some livecd) … that disk got sooo many formats and few dd= zero combos

When re-checked or re-plugged all seems (on first look nicely back again) but it is not. no matter why i try.
At this stage i connect that drive to Turris using sata2usb cable and trying to format it again.

Suprisingly so far so good. Superblock correctly written, partition seems to be ready.
Will see later after i put it back to SATA controller and try to mount it.

[update] so once done on usb2sata connector, all ok, mounted, lot of files copied over, just to a bit stress it. . Now i will see how it will behave. Later i will re-plug it back via sata cable and will see more . I am really thinking there something wrong with second channel on that sata controller when two devices are connected.

I check this doku https://www.kernel.org/doc/Documentation/kernel-parameters.txt for libata.force part . But i can’t find where and how to apply it in openWRT world of configs.
If that “echo -n “1.5,noncq” > /sys/module/libata/parameters/force” command should be really added in S10* script or somewhere else and very different way.

Paul_Totterman · December 7, 2016, 9:47pm

I’m told (on IRC) that CZNIC is aware of the problem and they are trying to reproduce it, but no luck so far. Maybe because of being busy with other stuff. So, all is not lost, but wait for more official communication to confirm any and all rumours, including this.

Maxmilian_Picmaus · December 8, 2016, 4:49pm

next update…:
when connected via sata2usb cable all is fine.
when connected directly to second channel it always got i/o error no matter if mounted or not…

Just issuing ‘blkid’ or/and ‘fdisk -l’’ , after that error any other commands (hdparm,cfdisk,sfdisk,blkid,fdisk) are failing …

from kernel log:

[ 1626.240218] sd 1:0:0:0: [sdb] tag#30 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[ 1626.240228] sd 1:0:0:0: [sdb] tag#30 CDB: opcode=0x88 88 00 00 00 00 01 5d 50 a3 00 00 00 00 08 00 00
[ 1626.240233] blk_update_request: I/O error, dev sdb, sector 5860532992
[ 1626.246805] Buffer I/O error on dev sdb1, logical block 5860530944, async page read
[ 1626.254503] Buffer I/O error on dev sdb1, logical block 5860530945, async page read

As i did not try that /proc/cmdline parameters for libata module i kind of try to downgrade both disks to 1,5G speed using jumpers. Obviously, that’s does not solve that i/o error issue.

Of course when reconnected via sata2usb , all nice back again. fsck tells me ‘all clean’ i can mount it and data seems to be fine.

So for now i will disconnect that disk, put the 1st one back to 3G speed and will wait for some official fix, note.

quick · December 12, 2016, 8:02am

I did some testing and want to share some of results.

Having exactly the same issues as OP and other posters with 2x 4TB WD REDs in raid 1 scenario.
Originally thought it was a ASM controller problem, so I have tried this one. With no luck. TO did not even detect it (ofc, kernel module was installed).

Then i have tried both controllers in my debian desktop and both worked well. I waited only to 10% of syncing, with no errors and with no slowdown. (I have a little time to play with, so only 10%).

Temporary conclusion is that problem is in TO HW or TO system.

blackdot · December 12, 2016, 8:25am

Hi Quick, for me the issue start appears once after first ~2TB of synchronized data… Will you have a time to repeat your test with more data?

quick · December 12, 2016, 8:51am

Hi,

I’ll do it probably within 1-2 weeks with both controllers, if u don’t mind.

quick · December 14, 2016, 3:49am

[UPDATE]
Well, I’ve got some time to spare and finished testing. Only ASM controller (provided with NAS perk) was used in my Debian desktop. SW raid 1 synced to 100% with no errors.

Testing of SIL3132 controller has no point any longer.

Turris Omnia raid 1 issues are not related to provided SATA controller. As I assumed above, it is TO hardware or TO operating system issue.

Jerry · December 14, 2016, 6:39pm

Hi ALL,

did anybody tried external power source for the HDDs ? - not using Omnia board for power suply. I had similar issue on my Turris 1.0 using ASM controler and 2 WDs red 2TB and “chinese power source”. Not sure, but I think I was not able to sync the disks after raid creation - I changed the power source for the HDDs and also cable from adapter to HDDs - first I used the cable similar to Omnia NAS box - both connectors for the HDDs on one cable, now I am using “Y” cable for the HDDs power and I have no problem… Good luck

Paul_Totterman · December 15, 2016, 9:29am

I think power is the most reasonable question right now. See Problems creating RAID1 and Filesystem . Where are the design documents with power use per component or power budget?

Duff · December 15, 2016, 9:47am

I tried to power disks by separate atx power supply and problems where the same as if powered by board.

cynerd · December 15, 2016, 9:58am

We are looking to this issue. But we had no luck of reproducing this without using WD Red drives. So is there anyone who encountered this with any other drives (different WD type or different manufacturer)?

kolaCZek · December 15, 2016, 10:07am

I am using external power for HDDs (www.ebay.com/itm/272196914722).
Still have problems with raid1 creation.

Paul_Totterman · December 15, 2016, 6:20pm

I’ve used WD Green, Hitachi and Seagate all with problems. Also I’ve used dmcrypt, so maybe the combined load of CPU (encryption) + 2xHDD is enough to push the limits.

quick · December 17, 2016, 9:49am

Power is not problem.
I left 2 HDDs in desktop case, powered with desktop power supply. SATA cables connected to TO. RAID 1 sync failed very fast - at 0.3%.

Maxmilian_Picmaus · December 17, 2016, 10:29am

2x ST3000VN007 Seagate IronWolf no matter what combination/configuration …
/dev/sdb gives me i/o error during or shortly after mount. when connected as /dev/sda all normal …
no raid configuration, tested many options including another disk with same firmware/revision.
Identical twins, stand-alone and in log i just seen something like “/dev/sdb … to big for me!” “trying[16]” later on i/o error is shown …

i will check it once more , after today system update …(i notice some kmod updates …) so maybe i will be lucky this time.

/KYP