i.MX6ULL comes with a Data Co-Processor (DCP) to accelerate AES-128, SHA-1, and SHA-256. When testing SHA-256 I am not seeing acceleration taking place.
For the sake of simplicity, here testing is done on the i.MX6ULL-EVK running standard image for that EVK:
root@imx6ull14x14evk:~# openssl speed sha256
Doing sha256 for 3s on 16 size blocks: 635770 sha256's in 3.00s
Doing sha256 for 3s on 64 size blocks: 364232 sha256's in 3.00s
Doing sha256 for 3s on 256 size blocks: 161926 sha256's in 3.00s
Doing sha256 for 3s on 1024 size blocks: 50296 sha256's in 2.99s
Doing sha256 for 3s on 8192 size blocks: 6761 sha256's in 3.00s
OpenSSL 1.0.2h 3 May 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
compiler: arm-poky-linux-gnueabi-gcc -march=armv7ve -mfpu=neon -mfloat-abi=hard -mcpu=cortex-a7 --sysroot=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/sysroots/imx6ul7d -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O2 -pipe -g -feliminate-unused-debug-types -fdebug-prefix-map=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/work/cortexa7hf-neon-poky-linux-gnueabi/openssl/1.0.2h-r0=/usr/src/debug/openssl/1.0.2h-r0 -fdebug-prefix-map=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/sysroots/x86_64-linux= -fdebug-prefix-map=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/sysroots/imx6ul7d= -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
sha256 3390.77k 7770.28k 13817.69k 17225.12k 18462.04k
root@imx6ull14x14evk:~# modprobe cryptodev
cryptodev: driver 1.8 loaded.
root@imx6ull14x14evk:~# openssl speed sha256 -engine cryptodev
engine "cryptodev" set.
Doing sha256 for 3s on 16 size blocks: 641864 sha256's in 3.00s
Doing sha256 for 3s on 64 size blocks: 364203 sha256's in 3.00s
Doing sha256 for 3s on 256 size blocks: 161940 sha256's in 3.00s
Doing sha256 for 3s on 1024 size blocks: 50285 sha256's in 3.00s
Doing sha256 for 3s on 8192 size blocks: 6762 sha256's in 3.00s
OpenSSL 1.0.2h 3 May 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
compiler: arm-poky-linux-gnueabi-gcc -march=armv7ve -mfpu=neon -mfloat-abi=hard -mcpu=cortex-a7 --sysroot=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/sysroots/imx6ul7d -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O2 -pipe -g -feliminate-unused-debug-types -fdebug-prefix-map=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/work/cortexa7hf-neon-poky-linux-gnueabi/openssl/1.0.2h-r0=/usr/src/debug/openssl/1.0.2h-r0 -fdebug-prefix-map=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/sysroots/x86_64-linux= -fdebug-prefix-map=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/sysroots/imx6ul7d= -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
sha256 3423.27k 7769.66k 13818.88k 17163.95k 18464.77k
root@imx6ull14x14evk:~# cat /proc/interrupts | grep dcp
236: 0 GPC 46 Level dcp-vmi-irq
237: 65 GPC 47 Level dcp-irq
If I perform the same test for AES-128 or SHA-1 I get reasonable results. For example, SHA-1 produces
root@imx6ull14x14evk:~# openssl speed sha1
Doing sha1 for 3s on 16 size blocks: 489675 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 371930 sha1's in 2.99s
Doing sha1 for 3s on 256 size blocks: 220473 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 82921 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 12197 sha1's in 3.00s
OpenSSL 1.0.2h 3 May 2016
...
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
sha1 2611.60k 7961.04k 18813.70k 28303.70k 33305.94k
root@imx6ull14x14evk:~# modprobe cryptodevNote performance gains for 1024 and 8192 block sizes. Also check DCP interrupt and observe that the count is reasonable:
cryptodev: driver 1.8 loaded.
root@imx6ull14x14evk:~# openssl speed sha1 -engine cryptodev
engine "cryptodev" set.
Doing sha1 for 3s on 16 size blocks: 16345 sha1's in 0.25s
Doing sha1 for 3s on 64 size blocks: 19929 sha1's in 0.35s
Doing sha1 for 3s on 256 size blocks: 18929 sha1's in 0.28s
Doing sha1 for 3s on 1024 size blocks: 14448 sha1's in 0.30s
Doing sha1 for 3s on 8192 size blocks: 7307 sha1's in 0.26s
OpenSSL 1.0.2h 3 May 2016
...
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
sha1 1046.08k 3644.16k 17306.51k 49315.84k 230226.71k
root@imx6ull14x14evk:~# cat /proc/interrupts | grep dcp
236: 0 GPC 46 Level dcp-vmi-irq
237: 84330 GPC 47 Level dcp-irq
What's going on?
Hi
Not a lot is told in NXP forums about DCP and even cryptodev. Here my findings:
I don't see speed issues using newer OpenSSL on i.MX6ULL, but I do see issues with both SHA1 and SHA256.
1) With added support for HMAC in cryptodev
Not a big deal, just not enable HMAC in cryptodev. But it's weird, there's no hmac(sha1) listed in /proc/crypto. Does Linux ignore lack of hmac support in driver and assumes all sha1/sha256 drivers have to support it? Yes, driver misses setkey() for SHA's.
2) cryptodev + openssl is OK for AES ciphers like -aes-128-cbc, provided you supply separate RAW key ( -K) and IV (-iv). But if you supply passphrase and let openssl calcultate K and IV (PBKDF2), then again encryption / decryption produces odd data.
It is possible to disable SHA in cryptodev (see ioctl.c, you need to suppress corresponding `case CRYPTO_SHAxxx` cases), but it would be best to fix them.
2) Using Softether VPN server and its "AES128-SHA" cipher. Impossible to client connect until I disable both SHA1 and SHA256 in cryptodev.
Perhaps does mxs-dcp driver signal SHA completion to early? Hardware issue?
Edward
how do to if i want to do this hardware acceleration in i.MX6ull? is there any corresponding driver?
@changbaoma
Hello,
Customers can try the following:
https://github.com/f-secure-foundry/mxs-dcp
Regards,
Yuri.