SATA HDD issues

Check post from @cynerd in this thread. As far as i remember, they know about it, know how to simulate few test scenarios, but not found generic fix yet. No date/time specified. We just have to wait.

@Maxmilian_Picmaus
They probably know about it but I think this is hardware design problem and it will be never repaired.

That’d be disappointing, very much so. I would expect an official and clear statement about this by now. :unamused:

In gitlab you can see that @brill is working on it.
His answer (copied from gitlab):

Not much to share yet. We are still experimenting with PCIe settings and we will dig into PCIe with Analyzer in order to capture the failures.

You (we) need to be patient. They will inform us about progress. Don’t worry.

3 Likes

I wish we had have an answer already. I am planning on buying two HDD but waiting to know if there is a compatibility issue.

what I found it is working:

  • one sata connected drive mounted and running do not cause any problems for me (btrfs defrag + lxc containers running + media server + etc all running together )
  • 2 drives on sata ports only sharing data on network ( cpu limits network transfers about 30MB/s and therefore pcie bus is not overloaded, no other services causing drives reading or writing than samba is not running )
  • transfers between usb 3.0 drive and sata connected drives
  • rsync between 2 drives with limited speed about 15MB/s (rsync --bwlimit=15M)

what is not working for me:

  • raid sync between two drives on sata ports (drives transfers data at maximum speed about 300MB/s together and it causes kernel error and sata connected drives are out of game until router is restarted)
  • “btrfs scrub” on two drives at once (same like previous note)
  • copying data at full speed between drives on sata ports (same like previous notes)
3 Likes

Disabling NCQ with:

echo 1 > /sys/block/sda/device/queue_depth
echo 1 > /sys/block/sdb/device/queue_depth

makes mdraid creation and syncing with no errors (so far).

Wondering how to make this change permanent without need to recompile the kernel (like in this answer).

Posibility to change some other kernel parameters permanently (kernel log buffer, cpufreq governor, libata parameters) will be much appreciated.

1 Like

it’s not enough to add it in the rc.local to be run at boot time?

Some of those parameters must be set at boot time (e.g. kernel log buffer). For others is sufficient user space change. In general, posibility to add/change kernel parameters at boot time would be great.

And it is more efficient to write

libata.force=noncq

than

for i in sd[abcde] do echo 1 > /sys/block/$i/device/queue_depth done

For me works to create file /etc/modules.d/41-aaa-libata with:

libata force=1.5,noncq

This file will not be overwriten by updater.

1 Like

And what configuration/setup you have?

  • single drive, two signle drives(twins or different fw), hw-raid, sw-raid…?

I have single drive on Turris 1.0 (not omnia) with mini PCIe sata card (ASM1061).

I added these lines in /etc/rc.local and rebooted router.
I am testing full defrag on both internal drives at this moment and leaving it running overnight. It looks like it helped because without it i was unable to complete initial sync of 40GB raid between two drives while with ncq turned off it finished without error.

does it affects performance on SATA ?

I have tried it last night and it looks good - copying about 2 TB between two drives (as I need to repartition them) went without issues.
Regarding performance drop - I don’t have full blown statistics, but in my case it looks minor performance drop from average 85MB/s to approx 80MB/s

root@Hugo:~# echo 1 > /sys/block/sdb/device/queue_depth
-bash: echo: write error: Invalid argument
root@Hugo:~# printf 1 > /sys/block/sdb/device/queue_depth
root@Hugo:~#

Copy 40GB from sda - btrfs (Samsung SSD 750 EVO 120GB) to sdb - btrfs (WDC WD20EFRX-68EUZN0) successfull.
About 82MB/s both cores 80-90% used.

@Drakula

  1. Initial sync of 40GB long RAID on 2 drives … completed.

  2. two separate operations on each internal drives:

  • btrfs defragmentation … still running for about 16 hours without error
  • rsync data between external usb3.0 and 1 internal drive … stopped it after 16h, no error

Both operation were not possible without NCQ turned off. It was just matter of 30min to run in sata reset and all data transfer stopped until I rebooted turris omnia router.

if you want see all ncq values on your system try this singleline command:
for i in ls /sys/block/sd*/device/queue_depth; do echo $i; cat $i; done

command output example:
/sys/block/sda/device/queue_depth
1
/sys/block/sdb/device/queue_depth
1
/sys/block/sdc/device/queue_depth
1
/sys/block/sdd/device/queue_depth
1

I don’t have a full understanding of the matter, but this seems to be the solution of the whole RAID issue? What do you guys think?