SATA HDD issues

floflooo · March 8, 2017, 5:06pm

I wish we had have an answer already. I am planning on buying two HDD but waiting to know if there is a compatibility issue.

technik007cz · March 8, 2017, 5:57pm

what I found it is working:

one sata connected drive mounted and running do not cause any problems for me (btrfs defrag + lxc containers running + media server + etc all running together )
2 drives on sata ports only sharing data on network ( cpu limits network transfers about 30MB/s and therefore pcie bus is not overloaded, no other services causing drives reading or writing than samba is not running )
transfers between usb 3.0 drive and sata connected drives
rsync between 2 drives with limited speed about 15MB/s (rsync --bwlimit=15M)

what is not working for me:

raid sync between two drives on sata ports (drives transfers data at maximum speed about 300MB/s together and it causes kernel error and sata connected drives are out of game until router is restarted)
“btrfs scrub” on two drives at once (same like previous note)
copying data at full speed between drives on sata ports (same like previous notes)

quick · March 20, 2017, 6:48am

Disabling NCQ with:

echo 1 > /sys/block/sda/device/queue_depth
echo 1 > /sys/block/sdb/device/queue_depth

makes mdraid creation and syncing with no errors (so far).

Wondering how to make this change permanent without need to recompile the kernel (like in this answer).

Posibility to change some other kernel parameters permanently (kernel log buffer, cpufreq governor, libata parameters) will be much appreciated.

maurer · March 20, 2017, 7:30am

it’s not enough to add it in the rc.local to be run at boot time?

quick · March 20, 2017, 7:45am

Some of those parameters must be set at boot time (e.g. kernel log buffer). For others is sufficient user space change. In general, posibility to add/change kernel parameters at boot time would be great.

And it is more efficient to write

libata.force=noncq

than

for i in sd[abcde] do echo 1 > /sys/block/$i/device/queue_depth done

hybner · March 21, 2017, 9:22am

For me works to create file /etc/modules.d/41-aaa-libata with:

libata force=1.5,noncq

This file will not be overwriten by updater.

Maxmilian_Picmaus · March 21, 2017, 1:22pm

And what configuration/setup you have?

single drive, two signle drives(twins or different fw), hw-raid, sw-raid…?

hybner · March 21, 2017, 1:44pm

I have single drive on Turris 1.0 (not omnia) with mini PCIe sata card (ASM1061).

technik007cz · March 22, 2017, 12:29am

I added these lines in /etc/rc.local and rebooted router.
I am testing full defrag on both internal drives at this moment and leaving it running overnight. It looks like it helped because without it i was unable to complete initial sync of 40GB raid between two drives while with ncq turned off it finished without error.

maurer · March 22, 2017, 7:09am

does it affects performance on SATA ?

Drakula · March 22, 2017, 10:16am

I have tried it last night and it looks good - copying about 2 TB between two drives (as I need to repartition them) went without issues.
Regarding performance drop - I don’t have full blown statistics, but in my case it looks minor performance drop from average 85MB/s to approx 80MB/s

fickk · March 22, 2017, 11:50am

root@Hugo:~# echo 1 > /sys/block/sdb/device/queue_depth
-bash: echo: write error: Invalid argument
root@Hugo:~# printf 1 > /sys/block/sdb/device/queue_depth
root@Hugo:~#

fickk · March 22, 2017, 12:01pm

Copy 40GB from sda - btrfs (Samsung SSD 750 EVO 120GB) to sdb - btrfs (WDC WD20EFRX-68EUZN0) successfull.
About 82MB/s both cores 80-90% used.

technik007cz · March 22, 2017, 4:44pm

@Drakula

Initial sync of 40GB long RAID on 2 drives … completed.
two separate operations on each internal drives:

btrfs defragmentation … still running for about 16 hours without error
rsync data between external usb3.0 and 1 internal drive … stopped it after 16h, no error

Both operation were not possible without NCQ turned off. It was just matter of 30min to run in sata reset and all data transfer stopped until I rebooted turris omnia router.

technik007cz · March 22, 2017, 4:57pm

if you want see all ncq values on your system try this singleline command:
for i in ls /sys/block/sd*/device/queue_depth; do echo $i; cat $i; done

command output example:
/sys/block/sda/device/queue_depth
1
/sys/block/sdb/device/queue_depth
1
/sys/block/sdc/device/queue_depth
1
/sys/block/sdd/device/queue_depth
1

j0n4s82 · March 22, 2017, 4:58pm

I don’t have a full understanding of the matter, but this seems to be the solution of the whole RAID issue? What do you guys think?

technik007cz · March 22, 2017, 5:01pm

It seems to be but it needs to add test results of another guys having same problem.

j0n4s82 · March 22, 2017, 5:02pm

Would you please explain this further? I don’t quite get your wording here.

technik007cz · March 22, 2017, 5:20pm

I mean situation with untouched turris omnia firmware. I could dream about doing something high I/O intensive on both internal drives together more than 30 minutes because it everytime stopped with sata error. And I am doing something for more than 16h without error.
But still wait for more tests.

Duff · March 23, 2017, 7:06am

It seams

echo 1 > /sys/block/sda/device/queue_depth
echo 1 > /sys/block/sdb/device/queue_depth

worked for me, I performed complete scrub of mdadm

echo check > /sys/block/md0/md/sync_action

on 4TB discs without any problem (over 4 hours of work). Seams NCQ is the culprit of hdd problems.