Odd thing is with my test build the ocf module makes no difference. I get same results with or without it loaded. Only after rebuildings openssl without the no-hw flag the cryptodev modules are used.
I did also build gnutls with crypto support but did not yet run benchmarks.
no-hw removes the support cryptodev so this is expected.
PS: there is also a performance hit and a higher latency for small packets.
After the latest turris-os update, kmod-crypto-ocf was removed! no cryptodev anymore
it;s in the repo:
kmod-crypto-ocf - 4.4.35+15-1-34abcd5e548fc8ed5390269f3a31d173-15 - OCF modules
yu just have to opkg install it
Beware - it caused radom reboots with previous kernel
Remain there are issues with kmod-crypto-ocf (random reboots)? Can I use kmod-cryptodev instead? Which is better?
kmod-crypto-ocf - 4.4.38+1-1-34abcd5e548fc8ed5390269f3a31d173-1 - OCF modules
kmod-cryptodev - 4.4.38+1.8-mvebu-2 - This is a driver for that allows to use the Linux kernel supported hardware ciphers by user-space applications.
you need both for hw cesa to work - don;t know why…
Strange… According to OpenWrt wiki:
(https://wiki.openwrt.org/doc/hardware/cryptographic.hardware.accelerators)
Enabling /dev/crypto
Run make menuconfig and select
With cryptodev-linux
kmod-crypto-core: m
kmod-cryptodev: m
With OCF
This must not be combined with cryptodev-linux.
Kernel modules → Cryptographic API modules
kmod-crypto-core: m
kmod-crypto-ocf: m
Utilities
ocf-crypto-headers: m
i know the openwrt wiki article
maybe Turris devs can share some info about this behavior
There are two incompatible cryptodev implementations. One that needs OCF and the other cryptodev without OCF.
There is another option called AF_ALG.
The big problem with all three is latency and it will only be faster than the cpu if your packets are around 2000 bytes. This is bigger than the MTU on most interfaces. You would gain a few percent idle at the cost of added latency and slower speed for single threaded usage.
The CESA is designed with post your request and come back later in mind.
so only application that would benefit is hdd-encryption…
HTTPS would also benefit. Sending a big file could mean full Gigabit TLS speed.
HDD encryption would benefit even without cryptodev. The kernel will use marvell-cesa without it.
so no openvpn
So with OpenVPN I willn’t get better performance, but will I get CPU offload?
- CPU for LXC, and others
- CESA for OpenVPN
You may still end with higher CPU load with offloading than without because switching between kernel and user space and key setup will still take cpu time.
OpenSSL 1.0.2k 26 Jan 2017 on OpenWRT:
openssl speed -elapsed md5 sha1 sha256 sha512 aes-128-cbc aes-192-cbc aes-256-cbc rsa2048
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md5 11920.08k 38629.27k 96373.42k 156120.75k 187555.84k
sha1 7931.32k 20200.87k 38828.20k 50833.41k 55833.94k
aes-128 cbc 37149.93k 40902.74k 42078.63k 42677.93k 42371.75k
aes-192 cbc 32349.19k 35378.20k 36829.44k 37113.71k 36948.65k
aes-256 cbc 29356.35k 31604.44k 32555.26k 32848.21k 32847.19k
sha256 9411.80k 20153.02k 33939.11k 40496.13k 42874.20k
sha512 1887.85k 7611.61k 10194.18k 13577.56k 15136.09k
sign verify sign/s verify/s
rsa 2048 bits 0.034315s 0.000875s 29.1 1143.2
OpenSSL 1.1.0e 16 Feb 2017 on LXC debian testing
openssl speed -elapsed md5 sha1 sha256 sha512 aes-128-cbc aes-192-cbc aes-256-cbc rsa2048
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md5 31325.70k 80877.70k 160978.69k 201437.53k 239099.90k
sha1 26711.07k 67869.08k 133916.67k 176733.18k 195865.26k
aes-128 cbc 51030.48k 55728.98k 57828.78k 58100.39k 58518.19k
aes-192 cbc 44298.30k 47816.47k 49265.07k 48639.66k 48878.93k
aes-256 cbc 38719.15k 41587.71k 42660.18k 43187.54k 42715.82k
sha256 16782.62k 38782.93k 72512.26k 91772.93k 98926.59k
sha512 6781.93k 27340.10k 39821.91k 55527.77k 62166.36k
sign verify sign/s verify/s
rsa 2048 bits 0.012674s 0.000309s 78.9 3234.2
yes,
that’s the weirdest results
lxc guest gets better numbers than host OS
It is not weird but expected. OpenWRT uses different compile time options and compiler parameters. The LXC one is fully optimized for speed. OpenWRT optimizes for a mixture of speed and size.
How do we get hw crypto to run inside a local container?
There is an engine called af_alg. This one works nearly everywhere.
Time used for normal tests is about 15 seconds user time. With af_alg this changes to about 8s user, 1s sys with 15s real.
Host without af_alg:
md5 17275.91k 54242.52k 129705.73k 199594.67k 237985.79k
sha1 15254.19k 45999.79k 108085.93k 166523.22k 196116.48k
sha256 11945.99k 32213.99k 66494.55k 90710.02k 101367.81k
aes-128-cbc 48093.78k 55224.23k 57812.39k 58540.71k 58763.95k
aes-192-cbc 40916.72k 47099.54k 48988.33k 49636.69k 49916.59k
aes-256-cbc 37382.25k 41243.95k 42630.57k 42553.34k 42983.42k
Host with af_alg:
md5 412.31k 1257.39k 4959.15k 18179.41k 81805.31k
sha1 411.68k 1256.00k 4993.28k 18160.64k 79814.66k
sha256 404.95k 1249.92k 5004.54k 17945.94k 82367.83k
aes-128-cbc 770.69k 3081.07k 11760.55k 33311.06k 88560.98k
aes-192-cbc 780.85k 3070.34k 11741.27k 32514.05k 84159.15k
aes-256-cbc 787.19k 3074.79k 11663.62k 31967.91k 80450.90k
Container with af_alg:
md5 369.91k 1168.70k 4611.33k 17281.02k 80508.25k
sha1 358.55k 1145.58k 4565.67k 16940.37k 73752.58k
sha256 355.31k 1129.49k 4539.05k 16869.38k 79443.29k
aes-128-cbc 729.00k 2877.59k 10902.44k 31802.71k 87037.27k
aes-192-cbc 727.13k 2918.31k 10979.58k 31191.04k 82201.26k
aes-256-cbc 711.20k 2845.55k 10863.53k 30954.84k 79811.93k
All test done using
OpenSSL 1.0.2k 26 Jan 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -D_FORTIFY_SOURCE=2 -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -Wl,-O1,--sort-common,--as-needed,-z,relro -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
In-Kernel test using cryptsetup benchmark --cipher aes-cbc:
without marvell-cesa:
aes-cbc 256b 28.6 MiB/s 28.3 MiB/s
with marvell-cesa:
aes-cbc 256b 89.9 MiB/s 91.2 MiB/s
with 1/4 the sys time used.
Yes it is really fast but sadly only aes-cbc is really useful.
Îs af_alg possible with Omni a host os? How do I run it Inside my debian lxc?