HW crypto - Marvell CESA working?


#1

Has someone managed to get marvell_cesa via cryptodev working?
/dev/crypto exists , marvell_cesa module is loaded and /proc/crypto shows mrvell_cesa as it’s options instead of kernel.
But openssl doesn’t load any cryptodev:

root@turris:~# openssl engine
(dynamic) Dynamic engine loading support


Passthrough Marvell mv_cesa crypto device into LXC container
#4

found it !
openssl sources in turris-os doesn’t include cryptodev.h library
The problem is that i can’t seem to be able to build packages…
libc built is different than the one installed
libc - 1.1.11-4
and built : libc - 1.1.11-3


#5

Maybe you need this branch. https://api.turris.cz/openwrt-repo/turris-stable/
If you use the medikit from the same folder all package versions should match.


#6

that’ for older turris - freescale cpu based - not omnia marvell based


#7

Use this one: https://api.turris.cz/openwrt-repo/omnia-stable/

It wasnt so hard to remove “turris-stable” to see what is inside openwrt-repo :stuck_out_tongue:


#8

omnia-stable has too old packages

I’ve solved the issue simply by installing:

opkg install kmod-crypto-ocf

and rebooting

After reboot engine cryptodev works.
But performance is not that great…
~50mbps 1k packets aes-128-cbc
~47mbps 1k packets aes-256-cbc

i have better performance inside the debian lxc ~59mbps 1kp aes-128-cbc


#9

What is the performance without the cesa module loaded?


#10

without:

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256 cbc 30844.58k 32352.51k 33027.24k 33210.71k 33243.14k

with

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 1166.66k 4620.93k 17200.47k 49008.30k 93656.41k


#11

what is the command you use to test these speeds, with or without crypto?
I have a box running pfsense right now. I would like to have a comparision.


#13

without cryptodev:

openssl speed aes-256-cbc

with cryptodev:

openssl speed -evp aes-256-cbc


#14

2.4GHz C2558 with pfSense loaded

[2.3.2-RELEASE][: openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 5386446 aes-256 cbc’s in 3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 1552874 aes-256 cbc’s in 3.00s
Doing aes-256 cbc for 3s on 256 size blocks: 401610 aes-256 cbc’s in 3.00s
Doing aes-256 cbc for 3s on 1024 size blocks: 260036 aes-256 cbc’s in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 32978 aes-256 cbc’s in 3.00s
OpenSSL 1.0.1s-freebsd 1 Mar 2016
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The ‘numbers’ are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256 cbc 28727.71k 33127.98k 34270.72k 88758.95k 90051.93k


#15

If you rerun the test with the -evp switch it should use the aesni instructions and perform alot faster.


#16

supermicro C2758 motherboard running pfsense 2.3.2.

here is my output with the above commands:
[2.3.2-RELEASE][admin@pfSense.localdomain]/root: openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 5624461 aes-256 cbc’s in 3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 1546310 aes-256 cbc’s in 3.01s
Doing aes-256 cbc for 3s on 256 size blocks: 399500 aes-256 cbc’s in 3.00s
Doing aes-256 cbc for 3s on 1024 size blocks: 258261 aes-256 cbc’s in 2.99s
Doing aes-256 cbc for 3s on 8192 size blocks: 32763 aes-256 cbc’s in 3.00s
OpenSSL 1.0.1s-freebsd 1 Mar 2016
The ‘numbers’ are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256 cbc 29997.13k 32902.26k 34090.67k 88383.25k 89464.83k
[2.3.2-RELEASE][admin@pfSense.localdomain]/root: openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 940842 aes-256-cbc’s in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 887360 aes-256-cbc’s in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 736515 aes-256-cbc’s in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 445157 aes-256-cbc’s in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 92036 aes-256-cbc’s in 3.00s
OpenSSL 1.0.1s-freebsd 1 Mar 2016
The ‘numbers’ are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 5017.82k 18930.35k 62849.28k 151946.92k 251319.64k


#17

The same supermicro C2758 board result with aes-256-gcm:

[2.3.2-RELEASE][admin@pfSense.localdomain]/root: env OPENSSL_ia32cap=0 openssl speed -elapsed -evp aes-256-gcm
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-gcm for 3s on 16 size blocks: 4136182 aes-256-gcm’s in 3.00s
Doing aes-256-gcm for 3s on 64 size blocks: 1240438 aes-256-gcm’s in 3.00s
Doing aes-256-gcm for 3s on 256 size blocks: 324611 aes-256-gcm’s in 3.00s
Doing aes-256-gcm for 3s on 1024 size blocks: 82946 aes-256-gcm’s in 3.03s
Doing aes-256-gcm for 3s on 8192 size blocks: 10350 aes-256-gcm’s in 3.02s
OpenSSL 1.0.1s-freebsd 1 Mar 2016
The ‘numbers’ are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-gcm 22059.64k 26462.68k 27700.14k 28020.36k 28115.96k
[2.3.2-RELEASE][admin@pfSense.localdomain]/root: openssl speed -elapsed -evp aes-256-gcm
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-gcm for 3s on 16 size blocks: 20013605 aes-256-gcm’s in 3.00s
Doing aes-256-gcm for 3s on 64 size blocks: 8781653 aes-256-gcm’s in 3.00s
Doing aes-256-gcm for 3s on 256 size blocks: 2787748 aes-256-gcm’s in 3.01s
Doing aes-256-gcm for 3s on 1024 size blocks: 751487 aes-256-gcm’s in 3.01s
Doing aes-256-gcm for 3s on 8192 size blocks: 95512 aes-256-gcm’s in 3.00s
OpenSSL 1.0.1s-freebsd 1 Mar 2016
The ‘numbers’ are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-gcm 106739.23k 187341.93k 237269.94k 255841.31k 260811.43k


#18

For your reference, asrock N3150 motherboard running xpenology DSM 6.0.2:

admin@DiskStation:~$ openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 4375351 aes-256 cbc’s in 2.98s
Doing aes-256 cbc for 3s on 64 size blocks: 1248789 aes-256 cbc’s in 2.99s
Doing aes-256 cbc for 3s on 256 size blocks: 322656 aes-256 cbc’s in 2.98s
Doing aes-256 cbc for 3s on 1024 size blocks: 220279 aes-256 cbc’s in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 27925 aes-256 cbc’s in 3.00s
OpenSSL 1.0.2j-fips 26 Sep 2016
The ‘numbers’ are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256 cbc 23491.82k 26729.93k 27718.10k 75188.57k 76253.87k
admin@DiskStation:~$ openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 25970494 aes-256-cbc’s in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 9717522 aes-256-cbc’s in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 2840404 aes-256-cbc’s in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 742252 aes-256-cbc’s in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 93931 aes-256-cbc’s in 3.00s
OpenSSL 1.0.2j-fips 26 Sep 2016
The ‘numbers’ are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 138509.30k 207307.14k 242381.14k 253355.35k 256494.25k


#19

Asrock N3150 running xpenology DSM 6.0.2 aes-256-gcm outputs:

admin@DiskStation:~$ OPENSSL_ia32cap=0 openssl speed -elapsed -evp aes-256-gcm
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-gcm for 3s on 16 size blocks: 3485519 aes-256-gcm’s in 3.00s
Doing aes-256-gcm for 3s on 64 size blocks: 1047222 aes-256-gcm’s in 3.00s
Doing aes-256-gcm for 3s on 256 size blocks: 272000 aes-256-gcm’s in 3.00s
Doing aes-256-gcm for 3s on 1024 size blocks: 68721 aes-256-gcm’s in 3.00s
Doing aes-256-gcm for 3s on 8192 size blocks: 8634 aes-256-gcm’s in 3.00s
OpenSSL 1.0.2j-fips 26 Sep 2016
The ‘numbers’ are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-gcm 18589.43k 22340.74k 23210.67k 23456.77k 23576.58k
admin@DiskStation:~$ openssl speed -elapsed -evp aes-256-gcm
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-gcm for 3s on 16 size blocks: 16866142 aes-256-gcm’s in 3.00s
Doing aes-256-gcm for 3s on 64 size blocks: 8257333 aes-256-gcm’s in 3.00s
Doing aes-256-gcm for 3s on 256 size blocks: 2757682 aes-256-gcm’s in 3.00s
Doing aes-256-gcm for 3s on 1024 size blocks: 759749 aes-256-gcm’s in 3.00s
Doing aes-256-gcm for 3s on 8192 size blocks: 97330 aes-256-gcm’s in 3.00s
OpenSSL 1.0.2j-fips 26 Sep 2016
The ‘numbers’ are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-gcm 89952.76k 176156.44k 235322.20k 259327.66k 265775.79k


#20

pfsense 2.3.2 uses openssl 1.0.1
synology DSM 6.0.2 uses openssl 1.0.2


#21

Are you sure this is right? IIRC, you have to use -elapsed to get openssl to measure actual time, and not CPU time in this case.


#22

This is my linksys wrt1200ac router running openwrt chaos calmer without hardware encryption.

openssl speed aes-128-cbc aes-256-cbc
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128 cbc 22115.49k 23533.72k 23918.11k 23759.20k 23392.41k
aes-256 cbc 17021.01k 17849.96k 18528.97k 18411.86k 18751.19k

Results with -evp
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 57870.11k 166301.54k 500421.82k 2999330.13k 5462835.20k
aes-256-cbc 34949.69k 203289.60k 588880.00k 1769523.20k 8192819.20k

Results with -elapsed
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128 cbc 22028.01k 23443.88k 23716.69k 23905.96k 23344.47k
aes-256 cbc 16927.14k 18014.42k 18365.10k 18414.25k 18614.95k

Results with -elapsed -evp
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 2164.35k 7273.90k 18415.27k 30216.19k 36088.49k
aes-256-cbc 2054.61k 6656.23k 15593.65k 23533.91k 27211.09k

Build Info:

OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr) 
compiler: ccache_cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/jow/relman-cc.1/.cache/sdk/mvebu/generic/staging_dir/target-arm_cortex-a9+vfpv3_uClibc-0.9.33.2_eabi/usr/include -I/home/jow/relman-cc.1/.cache/sdk/mvebu/generic/staging_dir/target-arm_cortex-a9+vfpv3_uClibc-0.9.33.2_eabi/include -I/home/jow/relman-cc.1/.cache/sdk/mvebu/generic/staging_dir/toolchain-arm_cortex-a9+vfpv3_gcc-4.8-linaro_uClibc-0.9.33.2_eabi/usr/include -I/home/jow/relman-cc.1/.cache/sdk/mvebu/generic/staging_dir/toolchain-arm_cortex-a9+vfpv3_gcc-4.8-linaro_uClibc-0.9.33.2_eabi/include -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -march=armv7-a -mtune=cortex-a9 -mfpu=vfpv3-d16 -fno-caller-saves -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=soft -fpic -fomit-frame-pointer -Wal

For rerference my desktop system (debian jessie-backports, kernel 4.7) with an i3-6100T

Results with -elapsed
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128 cbc 126771.10k 141282.43k 144572.76k 146331.65k 146792.45k
aes-256 cbc 93610.62k 101680.19k 102504.11k 101647.02k 95764.48k

Results with -elapsed -evp
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 1041366.59k 1159505.11k 1202731.69k 1211473.24k 1214163.63k
aes-256 cbc 91820.65k 101493.80k 102946.30k 103063.89k 101029.21k

Build Info:

OpenSSL 1.0.2j  26 Sep 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) 
compiler: gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DRC4_ASM -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM

Thanks for the pfsense comparison, seems these appliances can do everything i used openwrt for out of the box and are alot more powerfull but also more expensive. Giving the sg-2220 appliance an try.

Does ssh run stable with marvel-cesa? With strongswan the router sometimes reboots if ipsec starts but is afterwards stable.
Takes only a few seconds but can be seen on the console.


#23

There is not much under my control. I just copy and pasted the command.