eMMC is broken?


#1

Hi all!

Filesystem on my Omnia got broken and now router doesn’t boot. Restoring with medkit didn’t help.

I connected to UART and saw a message: ERROR: Did not find a cmdline Flattened Device Tree

After all I found out there is something weird with eMMC.
I boot into rescue shell and tried to recreate partition on /dev/mmcblk0. All commands for this operation I picked from rescue.sh script file.

So, recreating partitions with fdisk fails with an error message:
/dev/mmcblk0: close device failed: I/O error

It seems eMMC degraded too much and not usable anymore. But I can’t prove this.

Is it possible to bring router back to work or it was completely broken and I need to replace eMMC?


#2

First … there is an important question.
And the question is: “Do you use the LXC containers or NextCloud application on your Omnia router?”


#3

No, i didn’t.
Tried to create and play around. But never kept them up for a long time.


#4

I believe I have the same issue here also. I have my containers and rrd on an external 1Tb drive

System started to degrade (some networking fell over) so I decided to reboot the router, since then nothing. Prior to the reboot, syslog showed errors similar to:
“BTRFS warning (device mmcblk0p1): csum failed ino 1144950 off 942080 csum 1102237448 expected csum 3388852613”
I initially tried a rollback to previous snapshot, then roll back to factory and finally medkit. I’ve connected to the serial port and have also installed a tftp server locally and followed teh instructions to boot network debian and see if I can recover the mmc filesystem but nothing.

From the serial output I can see two issues. One is the following, and I see it at every reset/reboot in the first few lines:
“SF: Detected S25FL164K with page size 256 Bytes, erase size 64 KiB, total 8 MiB
*** Warning - bad CRC, using default environment”

And secondly, when I try the medkit, it detects fine, repartitions the mmc but then fails to mount resulting in a reboot.

Terminal capture here: https://pastebin.com/gv7YT0dS

I have managed to resuscitate an old router to get me back on line but this is quite disappointing…


#5

Hello guys,

I’m not aware of any eMMC failures, which wasn’t caused by LXC containers and I’d like to tell you, what and why it happened. If you had or have any LXC containers, I’d need to explain to you, why it is a really bad idea.

Common GNU/Linux distributions in LXC containers don’t count with running them the router and their logs or potentially databases writing into storage with very high frequency. That’s why we have system logs in RAM. This can be seen in the articles in our documentation. First see Error/bug reporting and then LXC containers.

Keep in mind, LXC containers are not enabled by default. They require to have at least some knowledge, how you can install one of those available images of Linux distributions as LXC containers and how you can use them.

Why it happened?
Internal storage in the router is eMMC, which the flash memory and it is used in micro SD cards, USB flash drives and so on. My point is all of them has a lifespan of writes and they don’t count with excessive amounts of writes, which can wear it and it is just a matter of time when you’ll wear it. The advantage of what I think of eMMC is that it is more reliable and faster than those devices, which I mention.

From both outputs of the serial console, I can see both eMMC are dead, which means all of your data is gone, and we can’t recover them. We have soldered eMMC on board, and it’s not easy to replace. It is almost impossible to swap it without expensive equipment. In our case, hopefully, the repair will be done by the 3rd company, and it will be paid repair because this is not what we can cover by warranty as it’s not manufacturers fault.

When you’d like to have LXC containers on your router, which is completely fine, but you’ll need to have external removable storage. In that case, even USB drive would be OK for that because they’re cheap and very easily replaceable just plug and play. Since Turris OS 3.10 you’re able to use Storage plugin, which is Foris to avoid any misconfigurations.

If you’d like to have to work your router ASAP, you can boot from mSATA SSD. The mSATA SSD should be inserted to the rightest slot near to heatsink. For more details I recommend you to see our documentation, where you can find more details about how you can boot from mSATA SSD. For future, we’d like to have the option to boot from USB stick.

The most recent warning what I can think for now was introduced in release notes for Turris OS 3.10 that this situation can happen. Release notes are available to be seen in our documentation, where you can also find Errata, which is the list of known bugs.

Once you had created LXC containers in CLI or GUI, you received notification in Foris, if you have configured sending notifications from Foris to your email address, you can find them in email as well. This warning is there for a long time, and it is described in the documentation for LXC containers. For the next version of Turris OS 3.10.6, which we’d like to release soon, we implemented more intensive warnings, which will be shown, when it detects that you’re running LXC containers on eMMC, it will tell you to use e.g. Storage plugin.


Additional steps for Internal SSD & problem with Storage module /srv
#6

Thanks for an answer.

I had some containers, but I didn’t use them for a long time and they were permanently down for about a year. So I don’t think that they caused eMMC failure.

Btw it is not a big problem for me to replace dead eMMC. Maybe it will be cheaper than buying mSATA SSD.
Finding exactly the same one is much harder. So is it possible to use flash with another capacity instead of stock 8Gb? Maybe there are some similar chips with the nearly same specifications I can use for replacement?


#7

So to be clear, even though my containers were stored on an external drive, they still caused the eMMC failure?