[poll] OpenWrt (subsequent TOS) curtailing CPU performance for Turris Omnia

I am sorry, I do not get your argument here. You seem pretty combative, and I see no reason for that. If I have offended you, please accept my apology, I really do not want this to escalate. Point being, a fix for a real show-stopper bug, had some side-effects, not nice, but certainly no catastrophe IMHO. I have a feeling that improving that situation, by ameliorating the side-effects is better achieved by not alienating those whose help would be required to do so.

I do not believe that the omnia was sold on the promise to guarantee the availability of any specific feature for any specified, let alone in an upstream project. But I also believe that team turris might be convincible to address this issue, as they do have an interest in keeping the turris devices working well. (But they also receive severe criticism when and where they diverge from upstream OpenWrt, which means short of fixing this situation upstream one fraction of omnia users will be unhappy).

That’s what I mean with “combativeness”. I wanted to indicate that I would be happier with 385+ devices operating at higher performance, and from your post I assumed you would too, so hence I used “we” to describe us as members of the set of people wanting a solution to the issue that would not sacrifice the omnia’s performance. It seems I misinterpreted your position, sorry.

Because I understand how TOS is based on and coupled with OpenWrt, and that divergence between the two is a pretty problematic issue. It took time and some pains by team turris to get TOS better aligned with OpenWrt proper. Here the issue is, that to maintain alignment, TOS will need to find an agreement with upstream that everybody can live with. But as long as yje performance sacrifice has not been quatified all of this discussion is rather theoretical… Also, just because I own an omnia, does not mean that I need to stop caring about the OpenWrt ecosystem in general.

Please, re-read what you just wrote and just think how you would behave if you owned a 370 device.

Sorry, by being unnecessarily combative you missed that I am not arguing against re-instating the more performant compiler settings for the omnia, I just believe this needs to be done with the turris and OpenWrt developers not against them (and I also accept that the side-effects of that might be unacceptable to either team, who ever maintains code, gets a say in decisions that affect maintainability IMHO, and the number of targets/archs/platfporms certainly carry a cost).
But then again without an actual assessment of the lost performance our argument is a bit academic. If you would/could show that omnia performance in normal use-cases would drop, by sat 50% I am sure you would get the attention of relevant developers, just arguing in the abstract about some loss in performance is far less effective IMHO.

agree, this is irrelevant without benchmark numbers

1 Like

Summary, you:

  • qualify the poll as bad taste and of questionable objective
  • qualify others in the course of the discourse as combatants
  • qualify the matter as theoretical, academic, unfortunate, not nice but non-catastrophic
  • qualify 50% (!) in performance drop as some legitimate threshold to get anyone’s attention
  • prefer to offload the benchmarking to the user
  • reckon that code maintenance burden outweighs code performance
  • assert that CZ.NIC is under no obligation towards their users to prevent diminution of paid hardware through software
  • assert that an O device owner should think about some hardware that is not even incorporated in the device

That sounds very well, legitimate and amicable. What has been your actual and constructive action however, aside from discoursing here?

What others have done:

  • reached out to CZ.NIC that yielded

For now, we are using what OpenWrt uses and there are no plans for changing that.

  • commented on the OpenWrt commit which yielded zero response from the developers
  • opened a thread in the OpenWrt forum which yielded zero response from the developers
  • opened a report in OpenWrt’s bug tracker that been dismissed as won’t fix

Do you intent to:

  • carry out the benchmarking, being an O owner?
  • issue a PR with OpenWrt or CZ.NIC and convince either otherwise?

Are you going to provide some? There is probably a reason (performance?) for CPU’s supporting Neon/vfpd32 as opposed to just vfpv3-d16.

I stand to that.

Read what I wrote, I called you combative, that is not the same as calling you an combatant also not others just you in the singular.

“sat 50%” which should have been “say 50%” was not intended as an exact threshold but just as an exemplary number for which I would think that developers would care.

No, not “the user”, specifically to you. You are making a big fuss and behaving belligerent without actually being sure whether there is any objective rationale to complain in the first place.

Again, no, I recon, that the developers are the arbiters of such decisions and I mentioned that maintenance cost is an important factor. I am not the developer here, so it is not my call to make.

Yes, unless you have a specific contract with CZ.NIC your demands will not be enforceable by law. But more importantly, by phrasing this as a demand you are not winning the heart and minds of those that can/will/might help on the code side of the problem…

The point is the change got introduced in OpenWrt because it fixes a massive problem for some devices, whether you as omnia owner care for those device or not is of no importance to the OpenWrt developers. So your argument, “undo that fix, because it potentially cost some performance on my omnia who cares about armada 370” is unlikely to gan traction over there. Might want to rethink/rephrase your proposition?

This is another example of what I call a combative attitude…

See, maintenance cost is important to developers after all.

If I see unexplained poor performance on my omnia I might start looking into that, but I realistically do not expect that. I have enough on my plate that I do not need to go looking for issues.

Sure, once you demonstrate that this is a real quantifiable problem with sufficient high performance cost, I will talk to folks, not that my voice carried weight around here. But I am not going to spend time on your pet peeve unless you demonstrate this to more than a theoretical consideration by giving data the measures the performance cost.

See, open source development works by scratching one’s own itches (or gently convincing people that they share an itch), so far I see only a theoretical issue. Feel free to convince me other-wise with real data (and maybe cool off for a bit and come back with less of an entitled attitude).

How does that relates to open source? You mean Neon/vfpd32 is an itch?


Done by others - and yet you qualify all of those persons just the same.


It will be demonstrating itself soon enough.


And a litigator on top of all.

Just keep on going with your qualifications, you seem really big on those.

You might wan’t to google for that phrase. This is about how free open source development works. None of the OpenWrt developers owe you anything and most develop this in their own time and hence on their own money. The presumption in open source development is that if something causes you unhappiness you go and fix it and then share that fix with others. That initial cause for unhappiness is the “itch” and fixing it yourself is the analog to scratching, at that point the metaphor fails, as the share with other part works not well with either itches nor scratches. So create a fix and post it on the OpenWrt/Turris developer lists. If that fix is actually good, you might see it committed even when presented with an attitude, but if the fix is far from perfect or maybe just an idea you expect some else to actually implement a more polite and civil approach seems to be better IMHO.
And on that note I will end my part of our conversation, as I have said all I have to say, and unfortunately did not manage to increase the level of discourse (my fault).

You keep on going about attitude, unhappiness and other such objections like a behavioural analyst, nothing that either relates to the technicality of the code introduction or the poll.

You have expressed disagreement on the poll and contributed expertly to the community.

Let the poll roll, shall we?

This is only about toolchain compilation which implies 0 to none performance impact. You obviously have no idea what it is about so the other guy was right calling you entitled.

just shows that you have no clue about CPU instructions being compiled into the code.

In this particular instance the compiled code performance does forgo half of the CPU’s available fpu registers as as well as NEON optimisations.

In a broad stoke it would be like compiling code with a 32-bit instruction set for a 64-bit capable CPU, forgoing the benefits of 64-bit code performance.

Why do you think there are CPUs with more advanced instructions sets (ARM 385) than others (ARM 370) - just some sort of fancy marketing by CPU manufacturers?

then you have actual numbers on the performance drop as you compared the differently compiled code and you are just teasing everybody? Or you don’t and it is a non issue until proven otherwise?

With curtailed CPU instructions compiled into the code the full CPU capabilities are logically diminished, unless you reckon that more advanced CPU instructions sets are just some sort of fancy marketing and do no actually contribute to the performance, do you?


Your prerogative of course to believe that there is no performance impact. Else, please see

We may disagree on whether the manufacturer or the user has to demonstrate the performance impact. But since you are opposing the notion of a performance impact you could always demonstrate it otherwise.

How hard is it to provide TOS without the slowdown patch?
Is the patch currently implemented in HBT so I could compare performance on 4.0.5 and 5.0.0 ?
(and if so, how could I meassure it?)

As moeller0 tried to explain, it seems confusing why would anyone tried to fix it when there is no proven issue yet, act just on your feeling and you did not do anything yourself. You are not forced to update to undesirable version of open source code, are you?
Also if I understand correctly the commit message, it was already 16-bit before, so you should be able to list the performance increase since the previous version?

It was patched to cover for an issue with the ARM 370 series, there was no issue with the ARM 385. But since OpenWrt does not differentiate between the two CPU classes it impacts the ARM 385 code compilation as well.


Entirely beside the point of the poll.


Is your perspective

?

Time for some data:

$ ./arm-openwrt-linux-objdump -d libgcrypt.so.20.0.1 | grep -i vadd
   3d7f8:       f27688a6        vadd.i64        d24, d22, d22
   44c78:       f2366848        vadd.i64        q3, q3, q4
   44ce4:       f2366848        vadd.i64        q3, q3, q4
   44d3c:       f2366848        vadd.i64        q3, q3, q4
   44d58:       f22488e0        vadd.i32        q4, q10, q8
   44d5c:       f26a28e6        vadd.i32        q9, q13, q11
   44db0:       f220e8c8        vadd.i32        q7, q8, q4
   44db4:       f26628ca        vadd.i32        q9, q11, q5
   44df8:       f228e844        vadd.i32        q7, q4, q2
   44e00:       f26a284c        vadd.i32        q9, q5, q6
   44e60:       f264484e        vadd.i32        q10, q2, q7
   44e68:       f26c8862        vadd.i32        q12, q6, q9
...

$ ./arm-openwrt-linux-objdump -d libgcrypt.so.20.0.1 | grep -i vadd | wc -l
325

So, at least some libraries make acutal use of NEON instructions.

The kernel, on the other hand, seems to not use any:

$ ./arm-openwrt-linux-objdump -d ip_tables.ko | grep -i vadd
$ ./arm-openwrt-linux-objdump -d ip_tables.ko | grep -i vmul
$ ./arm-openwrt-linux-objdump -d ip_tables.ko | grep -i vabs
$ ./arm-openwrt-linux-objdump -d ip_tables.ko | grep -i vand
$ ./arm-openwrt-linux-objdump -d ip_tables.ko | grep -i i64
$ ./arm-openwrt-linux-objdump -d ip_tables.ko | grep -i i32
$

So, if my little experiment is generalizable, the change could leave at least the routing performance of the router unchaged. Running userspace apps could get less efficient, though.

2 Likes

probably most those involving mathematical operations with

, e.g. randomness, checksums, cryptography

anyone? I plan to move to HBT anyway, so I’d like to help with any benchmarks, if this can tell us anything…

The baseline should be the same OS version, with same settings/userland, otherwise the benchmark gets skewed.


Requires to compile kernel and userland with the NEON instruction set instead of the introduced patch.

iiuc you don’t have time to test, so it’s the least we can compare for now.

doesn’t it require just skip the patch (one-line change)?

and, btw I din’t get the answer to:

is it? or how can I find out?

is it possible for us to check the differencies in performance somehow?