HW crypto - Marvell CESA working?

Odd thing is with my test build the ocf module makes no difference. I get same results with or without it loaded. Only after rebuildings openssl without the no-hw flag the cryptodev modules are used.
I did also build gnutls with crypto support but did not yet run benchmarks.

no-hw removes the support cryptodev so this is expected.

PS: there is also a performance hit and a higher latency for small packets.

After the latest turris-os update, kmod-crypto-ocf was removed! no cryptodev anymore

it;s in the repo:
kmod-crypto-ocf - 4.4.35+15-1-34abcd5e548fc8ed5390269f3a31d173-15 - OCF modules

yu just have to opkg install it
Beware - it caused radom reboots with previous kernel

Remain there are issues with kmod-crypto-ocf (random reboots)? Can I use kmod-cryptodev instead? Which is better?

kmod-crypto-ocf - 4.4.38+1-1-34abcd5e548fc8ed5390269f3a31d173-1 - OCF modules
kmod-cryptodev - 4.4.38+1.8-mvebu-2 - This is a driver for that allows to use the Linux kernel supported hardware ciphers by user-space applications.

you need both for hw cesa to work - don;t know why…

Strange… According to OpenWrt wiki:
(https://wiki.openwrt.org/doc/hardware/cryptographic.hardware.accelerators)

Enabling /dev/crypto

Run make menuconfig and select

With cryptodev-linux
kmod-crypto-core: m
kmod-cryptodev: m

With OCF
This must not be combined with cryptodev-linux.
Kernel modules → Cryptographic API modules
kmod-crypto-core: m
kmod-crypto-ocf: m

Utilities
ocf-crypto-headers: m

i know the openwrt wiki article
maybe Turris devs can share some info about this behavior

There are two incompatible cryptodev implementations. One that needs OCF and the other cryptodev without OCF.
There is another option called AF_ALG.

The big problem with all three is latency and it will only be faster than the cpu if your packets are around 2000 bytes. This is bigger than the MTU on most interfaces. You would gain a few percent idle at the cost of added latency and slower speed for single threaded usage.

The CESA is designed with post your request and come back later in mind.

so only application that would benefit is hdd-encryption…

HTTPS would also benefit. Sending a big file could mean full Gigabit TLS speed.

HDD encryption would benefit even without cryptodev. The kernel will use marvell-cesa without it.

so no openvpn :frowning:

So with OpenVPN I willn’t get better performance, but will I get CPU offload?

  • CPU for LXC, and others
  • CESA for OpenVPN

You may still end with higher CPU load with offloading than without because switching between kernel and user space and key setup will still take cpu time.

OpenSSL 1.0.2k 26 Jan 2017 on OpenWRT:
openssl speed -elapsed md5 sha1 sha256 sha512 aes-128-cbc aes-192-cbc aes-256-cbc rsa2048
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md5 11920.08k 38629.27k 96373.42k 156120.75k 187555.84k
sha1 7931.32k 20200.87k 38828.20k 50833.41k 55833.94k
aes-128 cbc 37149.93k 40902.74k 42078.63k 42677.93k 42371.75k
aes-192 cbc 32349.19k 35378.20k 36829.44k 37113.71k 36948.65k
aes-256 cbc 29356.35k 31604.44k 32555.26k 32848.21k 32847.19k
sha256 9411.80k 20153.02k 33939.11k 40496.13k 42874.20k
sha512 1887.85k 7611.61k 10194.18k 13577.56k 15136.09k
sign verify sign/s verify/s
rsa 2048 bits 0.034315s 0.000875s 29.1 1143.2

OpenSSL 1.1.0e 16 Feb 2017 on LXC debian testing
openssl speed -elapsed md5 sha1 sha256 sha512 aes-128-cbc aes-192-cbc aes-256-cbc rsa2048
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md5 31325.70k 80877.70k 160978.69k 201437.53k 239099.90k
sha1 26711.07k 67869.08k 133916.67k 176733.18k 195865.26k
aes-128 cbc 51030.48k 55728.98k 57828.78k 58100.39k 58518.19k
aes-192 cbc 44298.30k 47816.47k 49265.07k 48639.66k 48878.93k
aes-256 cbc 38719.15k 41587.71k 42660.18k 43187.54k 42715.82k
sha256 16782.62k 38782.93k 72512.26k 91772.93k 98926.59k
sha512 6781.93k 27340.10k 39821.91k 55527.77k 62166.36k
sign verify sign/s verify/s
rsa 2048 bits 0.012674s 0.000309s 78.9 3234.2

yes,
that’s the weirdest results :slight_smile:
lxc guest gets better numbers than host OS

It is not weird but expected. OpenWRT uses different compile time options and compiler parameters. The LXC one is fully optimized for speed. OpenWRT optimizes for a mixture of speed and size.

How do we get hw crypto to run inside a local container?

There is an engine called af_alg. This one works nearly everywhere.
Time used for normal tests is about 15 seconds user time. With af_alg this changes to about 8s user, 1s sys with 15s real.

Host without af_alg:

md5              17275.91k    54242.52k   129705.73k   199594.67k   237985.79k
sha1             15254.19k    45999.79k   108085.93k   166523.22k   196116.48k
sha256           11945.99k    32213.99k    66494.55k    90710.02k   101367.81k
aes-128-cbc      48093.78k    55224.23k    57812.39k    58540.71k    58763.95k
aes-192-cbc      40916.72k    47099.54k    48988.33k    49636.69k    49916.59k
aes-256-cbc      37382.25k    41243.95k    42630.57k    42553.34k    42983.42k

Host with af_alg:

md5                412.31k     1257.39k     4959.15k    18179.41k    81805.31k
sha1               411.68k     1256.00k     4993.28k    18160.64k    79814.66k
sha256             404.95k     1249.92k     5004.54k    17945.94k    82367.83k
aes-128-cbc        770.69k     3081.07k    11760.55k    33311.06k    88560.98k
aes-192-cbc        780.85k     3070.34k    11741.27k    32514.05k    84159.15k
aes-256-cbc        787.19k     3074.79k    11663.62k    31967.91k    80450.90k

Container with af_alg:

md5                369.91k     1168.70k     4611.33k    17281.02k    80508.25k
sha1               358.55k     1145.58k     4565.67k    16940.37k    73752.58k
sha256             355.31k     1129.49k     4539.05k    16869.38k    79443.29k
aes-128-cbc        729.00k     2877.59k    10902.44k    31802.71k    87037.27k
aes-192-cbc        727.13k     2918.31k    10979.58k    31191.04k    82201.26k
aes-256-cbc        711.20k     2845.55k    10863.53k    30954.84k    79811.93k

All test done using

OpenSSL 1.0.2k  26 Jan 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -D_FORTIFY_SOURCE=2 -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -Wl,-O1,--sort-common,--as-needed,-z,relro -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM

In-Kernel test using cryptsetup benchmark --cipher aes-cbc:
without marvell-cesa:
aes-cbc 256b 28.6 MiB/s 28.3 MiB/s
with marvell-cesa:
aes-cbc 256b 89.9 MiB/s 91.2 MiB/s
with 1/4 the sys time used.

Yes it is really fast but sadly only aes-cbc is really useful.

Îs af_alg possible with Omni a host os? How do I run it Inside my debian lxc?