HW crypto - Marvell CESA working?


#24

Try adding -elapsed to it - I’m pretty sure it’s not doing 22GB/s.


#25

updated post. Thanks


#26

[2.3.2-RELEASE]: openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 938835 aes-256-cbc’s in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 915252 aes-256-cbc’s in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 749857 aes-256-cbc’s in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 446561 aes-256-cbc’s in 3.01s
Doing aes-256-cbc for 3s on 8192 size blocks: 92758 aes-256-cbc’s in 3.00s
OpenSSL 1.0.1s-freebsd 1 Mar 2016
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The ‘numbers’ are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 5007.12k 19525.38k 63987.80k 152030.24k 253291.18k


#27

I think your wrt1200ac is not optimized. Have you tried other firmware? Dd-wrt or tomato or stock?

My netgear r7000 is much better than yours running dd-wrt or tomato.


#28

It’s running with an older 3.18 kernel and only has the older mv_cesa module which does not integrade with cryptodev. Been testing the marvel_cesa module half an year back but it was quite unstable back then. It can handle ~100MBit througput with strongswan which is more than enough for my needs at the moment. With hardware encryption it was around twice that much.
What are your openssl results on the r7000?


#29

Netgear R7000 running dd-wrt without cryptochip.

aes-256 cbc: 27368.36k

Don’t have the result of with cryptochip as it is running stock as AP. No telnet or ssh access.

As compared to yours 18751.19k.


#30

The 18751.19k is the result with the generic encryption routines. Optimized is the one with the -evp switch 27211.09k. It does not use the crypto chip. I think it uses the software encryption coming with the kernel via cryptodev in this case.
However the wrt1200ac cpu runs at 1.3GHz and r7000 uses 1GHz, both are arm7’s, so it should be 33% faster.


#31

The number of r7000 was without -evp. Plain openssl speed aes-256-cbc.
I expect it would be better with -evp.

I found firmware has lot to do with the performance as well. It is not just hardware.


#32

My router was not completely idle while i ran the test, but results did not differ much between multiple runs.

I may try an nightly build of openwrt on another wrt1200ac if i find the time or give dd-wrt an try just for curiosity. :wink:


#33

What I find most interesting in Omnia is running openssl inside a container.
in my debian8 lxc i get +37% compared to openwrt no cryptodev. actually is better than cryptodev for aes-128-cbc

openssl speed aes-128-cbc
Doing aes-128 cbc for 3s on 16 size blocks: 9505467 aes-128 cbc’s in 2.93s
Doing aes-128 cbc for 3s on 64 size blocks: 2635793 aes-128 cbc’s in 2.94s
Doing aes-128 cbc for 3s on 256 size blocks: 682854 aes-128 cbc’s in 2.98s
Doing aes-128 cbc for 3s on 1024 size blocks: 171442 aes-128 cbc’s in 2.96s
Doing aes-128 cbc for 3s on 8192 size blocks: 21436 aes-128 cbc’s in 2.92s
OpenSSL 1.0.2j 26 Sep 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -I. -I… -I…/include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -march=armv7-a -Wa,–noexecstack -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The ‘numbers’ are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128 cbc 51906.99k 57377.81k 58661.28k 59309.66k 60138.26k

And this numbers can be improved if cz.nic will compile kernel with NEON flangs and Openssl with neon assembler code

-O3 -march=armv7-a -mtune=cortex-a9 -mfpu=neon -ftree-vectorize


#34

Odd that fpu flags make such an impact here. mfloat-abi is the c-flag that differes between router oses, Openwrt uses soft, turris uses hard and dd-wrt softfp. All use -mfpu=vfpv3-d16 for openssl.


#35

i took my example from cortex A8
http://processors.wiki.ti.com/index.php/Cortex-A8#What_does_a_Neon_assembly_instruction_look_like


#36

Interesting that you for the link. Found this patch adding neos asm optimizations to the kernel. Seems this one is already part of the 4.4 kernel turris uses. http://lists.infradead.org/pipermail/linux-arm-kernel/2015-February/326079.html

Did you enter this compiler options in “Advanced configuration options/Target Options” when running make menuconfig?


#37

i can’t manage to build using the SDK :frowning: - last issues are with compat-wireless


#38

Did you install all the feeds? Usually I only link the packages I want to modify from the feeds subdirs to the packages subdir. Seen your issue at the git repo. Can be you just need ccache.
If you modify the kernel getting the full build environment is nice. You can try this first

git clone -b test https://github.com/CZ-NIC/turris-os
cd turris-os/
./compile_omina_fw

This should build everything.

Afterwards you can customize it using make menuconfig etc.


#39

Made an test branch firmware image with the ocf module included and neon enabled in the kernel.
Results where the same with or without -evp.
So I remove the no-hw flag from the openssl package Makefile and now i get:

openssl speed -elapsed
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128 cbc 28429.52k 29528.06k 29959.51k 29977.94k 30018.22k
aes-256 cbc 22039.50k 22349.97k 22521.60k 22540.29k 22579.88k

openssl speed -elapsed -evp
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 954.52k 3749.25k 13935.53k 34545.66k 87668.05k
aes-256-cbc 948.56k 3733.16k 13784.32k 33482.41k 80669.35k

openssl speed -elapsed -evp (without marvell_cesa loaded)
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 1673.80k 6045.35k 17509.38k 26534.91k 41986.73
aes-256-cbc 1418.22k 5035.18k 14048.77k 20832.60k 31328.94k


#40

With the opkg install kmod-crypto-ocf command the performance increases very much on my omnia!

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 65255.68k 326756.80k 2453196.80k 14917836.80k 30412800.00k
aes-256-cbc 53027.47k 162090.40k 957501.44k 3543116.80k 13957120.00k

Or is this some measurement fault?


#41

yes, it;s some kind o f measurement fault :slight_smile:
to get something near real-life values insert “-elapsed” at the end of the speed command
aes-128-cbc 1083.81k 4401.24k 17090.65k 50279.42k 102823.25k
aes-256-cbc 1059.57k 4381.95k 16075.09k 47757.65k 93320.53k


#42

Does openvpn-openssl use evp by default after installing kmod-crypto-ocf? What about openvpn-polarssl?


#43

from what i know openvpn-openssl used evp openssl interface by default.
you also have the option to call it as “engine cryptodev” in the server/client config.
polarssl doesn’t use cryptodev.
gnutls also is know to use cryptodev but it needs recompilation and from my tests i didn;t see any improvement in cli benchmarks.

PS: google oauth seems broken and only twitter login worked for me to relogin