SATA HDD issues

I got same problem as @Maxmilian_Picmaus . Two HDD 3GB (ext4 2x2.8GB). Problem occurs on sdb more frequently after a connect sda to LXC - but not only second drive. Try both fix that didn’t help.

Just to understand the commonalities of the errors: What filesystem do you have? And you (both) are not using MD RAID, do you?

Btw. the error seems to be slightly different, so the question is whether it is the same error as the original SATA link resets.

Tomas

Not sure (all my issues are described somewhere above).
1 single drive installed can run on any speed with ncq on. stable like a hell
2. two single drives GBT with (ext2 or ext3 or ext4 ) failed each time on second channel very shortly after mount
3. two single drives (one GBT and second MBR= loosing some space) failed after like 24hours of heavy load
4. all those tried with one big partition or several small ones.
5. it is always second channel causing it (no matter what disk is connected)
6. when same disk is connected via sata2usb it is working perfectly fine
Note1: no RAID configuration.
Note2: single drive on any channel can operate on 6,2G (at least it is detected), with ncq turned on with no issues. Any combination with two drives makes drive on channel2 failing (in my case , in any scenario i tried).
Note3: partitioning/formatting was tested with direct plug, via usb convertor and on different host. Doing in directly on turris caused several issues based on partition/format type(s).

I notice a slight difference in log dump (but very similar). I thought it is just different text , same meaning. Anyway, if you will need some data-log collection let me know in private message

-max-

1 Like

@brill @Maxmilian_Picmaus

I am missing something important.

There is full command list:
echo “libata force=1.5” >> /etc/modules.d/40-aaa-libata
ln -sv /etc/modules.d/40-aaa-libata /etc/modules-boot.d/40-aaa-libata

This is working for me for originally supplied 2 sata port or 4 sata port minPCIe card I bought.

Note: If you want you can turn off ncq (libata force=1.5,noncq) but it is not necessary.

Thanks for command list (i’ve done it exactly as shown).
Like all scenarios (1,5+ncq vs 1,5+noncq, 1,5 only). Together with “queue_type” . I think i covered all single drives scenarios and recommended workarounds.

I am considering to borrowing/buying some other sata controller to have backup one and also to mitigate possible faulty hw piece. I’ll wait for some official info/fix for it.

Can you post link to that miniPCIe card you bought?

I bought these items together:

https://www.aliexpress.com/item/PCI-Express-x1-6Gbps-Mini-PCIe-to-4-Ports-SATA-3-0-Adapter-Controller-Expansion-Raid/32793704153.html?spm=2114.13010608.0.0.lCpx5w

https://www.aliexpress.com/item/50cm-internal-Mini-SAS-4i-36Pin-SFF-8087-Host-to-4-SATA-7pin-hard-disk-target/1613006680.html?spm=2114.13010608.0.0.lCpx5w

2pcs
https://www.aliexpress.com/item/New-2-5-SSD-HDD-To-3-5-Drive-Bay-Plastic-Mounting-kit-Adapter-Bracket-Dock/32562816418.html?spm=2114.13010608.0.0.lCpx5w

1 Like

You mentioned that you’ve bought the MiniPCI to MiniSAS conroller for the Turris Omnia.
According to AliExpress it utilizes an ASM1061 controller. But by closely looking at the supplied pictures, it seems as if there is a Marvell controller installed. To this, your SAS controller offers 4 SATA ports, while the ASM1061 controller can only supply 2 SATA ports, underpinning my assumption.
Can you confirm this?

If it is a Marvell controller, did you ever see the NCQ problems at all?

To be clear: I bought the Turris Omnia as a late replacement of a Hummingboard i.MX6 Quad that was also utilizing an ASM1061R 2-port SATA controller with RAID support. And I faced exact the same issues the forum sees here too. I recall that the ASM1061 controller destroyed one brand new 2.5’ seagate disk, and even after the replacement arrived I saw a ton of NCQ problems. Therewith I assme that there won’t be a clear solution, more like a workaround.

But sadly all of the cheap MiniPCIe 2-port SATA controllers do use the ASM1061 controller but if yours doesn’t it would be great to know how it behaves and if the workaround is truely required. Especially by considering the price.

Only other solution would be Innodisk EMPS-3401 that utilizes a Marvell 88SE9215 or IOCrest IO-mPCE9215-4I that most likely also utilizes a Marvell, but both controller boards are priced above 33$, making them less attractive.

Just as a hint: search for ASM1061 + NCQ or ASM1061 + Linux within the web and you will see a lot of people complaining since a while …

I received exactly same item like you can see on the picture.[quote=“psiegl, post:130, topic:1173”]
If it is a Marvell controller, did you ever see the NCQ problems at all?
[/quote]

This controller has no icq issues but it is not without issues. You must still limit SATA link speed to 1.5G but if you will not do it this controller can reset itself and than it should work if I remember. Original controller supplied with Turris Omnia tried same thing but it ended with no success and hanged system => you must reboot Omnia.
To make both controllers working you must run commands bellow and reboot:
echo “libata force=1.5” >> /etc/modules.d/40-aaa-libata
ln -sv /etc/modules.d/40-aaa-libata /etc/modules-boot.d/40-aaa-libata

I bought it because 3.5" drives make too much noise and with 3.5" to 2.5" adapters I got space for 4 drives. But you must “stick” this card on higher spacers because onboard rtc backup battery is in way.
Last thing what I want to add this card with speed limit set is working without problems. I run btrfs scrub on all drives together when I was writing this reply and it ended without any checksum error. Everything is fine.

scrub status for xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    scrub started at Fri May  5 19:50:07 2017 and finished after 00:01:32
    total bytes scrubbed: 10.72GiB with 0 errors
scrub status for xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    scrub started at Fri May  5 19:50:07 2017 and finished after 01:37:53
    total bytes scrubbed: 1.15TiB with 0 errors
scrub status for xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    scrub started at Fri May  5 19:50:07 2017 and finished after 00:00:22
    total bytes scrubbed: 1.89GiB with 0 errors

Since no sharp picture of the controller exists, and mine arrived today, I would like to share a picture here.

As expected, it is a Marvell controller with the model no. 88SE9215. This means, that no AES and no HW raid, but four SATA ports with 6Gb/s are supported Marvell 88SE9215.

Encountered the problem w/ two 2T and two 4T Seagate drives in two different Omnia devices (mine and a friend’s). The 4T HDDs are new, the 2T HDDs are years old.

So, problem is still not fixed ? Or any solution for make RAID 1 of 4 TB disks working on 100% with mirroring and sync ? thx

If you can read carefully yes the problem has been fixed.
You just need to be able to apply fix.

Not fixed. Just a ugly workaround.

1 Like

Yes workaround - disabling NCQ helps in my case.
echo 1 > /sys/block/sda/device/queue_depth
echo 1 > /sys/block/sdb/device/queue_depth

In my case I have to also disable hd-idle from luci, and enable it from ssh in config file (2x 3TB WD Red). In this config it works fine for me.

Still having problems. Any news from Team Turris?

What happened to the bug report? The bug is still here but the report gone.
Any news about this issue?

1 Like

Yeah. You can see this:

Fix is coming! :slight_smile:

Again workaround - disabling NCQ.

1 Like

So after update to 3.8.3, nothing changed. Shortly after mounting, kernel log message … (same as before,…)

[details=Summary]> [319397.558578] Buffer I/O error on dev sdb1, logical block 1, async page read

[319397.565635] sd 1:0:0:0: [sdb] tag#19 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
[319397.565642] sd 1:0:0:0: [sdb] tag#19 CDB: opcode=0x88 88 00 00 00 00 00 00 00 08 02 00 00 00 06 00 00
[319397.565646] blk_update_request: I/O error, dev sdb, sector 2050
[319397.571669] Buffer I/O error on dev sdb1, logical block 2, async page read
[319397.578657] Buffer I/O error on dev sdb1, logical block 3, async page read
[319397.585645] Buffer I/O error on dev sdb1, logical block 4, async page read
[319397.592622] Buffer I/O error on dev sdb1, logical block 5, async page read
[319397.599607] Buffer I/O error on dev sdb1, logical block 6, async page read
[319397.606591] Buffer I/O error on dev sdb1, logical block 7, async page read[/details]

tbh: i have less time to play with it, so for now i don’t care, running on single drive.