[poll] OpenWrt (subsequent TOS) curtailing CPU performance for Turris Omnia

How hard is it to provide TOS without the slowdown patch?
Is the patch currently implemented in HBT so I could compare performance on 4.0.5 and 5.0.0 ?
(and if so, how could I meassure it?)

As moeller0 tried to explain, it seems confusing why would anyone tried to fix it when there is no proven issue yet, act just on your feeling and you did not do anything yourself. You are not forced to update to undesirable version of open source code, are you?
Also if I understand correctly the commit message, it was already 16-bit before, so you should be able to list the performance increase since the previous version?

It was patched to cover for an issue with the ARM 370 series, there was no issue with the ARM 385. But since OpenWrt does not differentiate between the two CPU classes it impacts the ARM 385 code compilation as well.


Entirely beside the point of the poll.


Is your perspective

?

Time for some data:

$ ./arm-openwrt-linux-objdump -d libgcrypt.so.20.0.1 | grep -i vadd
   3d7f8:       f27688a6        vadd.i64        d24, d22, d22
   44c78:       f2366848        vadd.i64        q3, q3, q4
   44ce4:       f2366848        vadd.i64        q3, q3, q4
   44d3c:       f2366848        vadd.i64        q3, q3, q4
   44d58:       f22488e0        vadd.i32        q4, q10, q8
   44d5c:       f26a28e6        vadd.i32        q9, q13, q11
   44db0:       f220e8c8        vadd.i32        q7, q8, q4
   44db4:       f26628ca        vadd.i32        q9, q11, q5
   44df8:       f228e844        vadd.i32        q7, q4, q2
   44e00:       f26a284c        vadd.i32        q9, q5, q6
   44e60:       f264484e        vadd.i32        q10, q2, q7
   44e68:       f26c8862        vadd.i32        q12, q6, q9
...

$ ./arm-openwrt-linux-objdump -d libgcrypt.so.20.0.1 | grep -i vadd | wc -l
325

So, at least some libraries make acutal use of NEON instructions.

The kernel, on the other hand, seems to not use any:

$ ./arm-openwrt-linux-objdump -d ip_tables.ko | grep -i vadd
$ ./arm-openwrt-linux-objdump -d ip_tables.ko | grep -i vmul
$ ./arm-openwrt-linux-objdump -d ip_tables.ko | grep -i vabs
$ ./arm-openwrt-linux-objdump -d ip_tables.ko | grep -i vand
$ ./arm-openwrt-linux-objdump -d ip_tables.ko | grep -i i64
$ ./arm-openwrt-linux-objdump -d ip_tables.ko | grep -i i32
$

So, if my little experiment is generalizable, the change could leave at least the routing performance of the router unchaged. Running userspace apps could get less efficient, though.

2 Likes

probably most those involving mathematical operations with

, e.g. randomness, checksums, cryptography

anyone? I plan to move to HBT anyway, so I’d like to help with any benchmarks, if this can tell us anything…

The baseline should be the same OS version, with same settings/userland, otherwise the benchmark gets skewed.


Requires to compile kernel and userland with the NEON instruction set instead of the introduced patch.

iiuc you don’t have time to test, so it’s the least we can compare for now.

doesn’t it require just skip the patch (one-line change)?

and, btw I din’t get the answer to:

is it? or how can I find out?

is it possible for us to check the differencies in performance somehow?

Maybe there is hope after all, looks like a major contributor to the OpenWrt repo takes another stab at providing more diversity to the CPU classes https://github.com/openwrt/openwrt/pull/3079 :crossed_fingers:

1 Like

Closed the poll now since it ran its course and with the aforementioned commit in OpenWrt Master has turned superfluous, the Omnia benefiting from NEON instructions (and all implied instructions) compiled into the code build from OpenWrt.

Thanks to those who elected to participate with a vote as well those having contributed in the discourse.