AnsweredAssumed Answered

i.MX6ULL issue with DCP for SHA-256

Question asked by andreiv on Jul 9, 2019
Latest reply on Jul 10, 2019 by Yuri Muhin

i.MX6ULL comes with a Data Co-Processor (DCP) to accelerate AES-128, SHA-1, and SHA-256.  When testing SHA-256 I am not seeing acceleration taking place.

 

For the sake of simplicity, here testing is done on the i.MX6ULL-EVK running standard image for that EVK:

  1. Power up the EVK and log in
  2. Run OpenSSL speed test without cryptodev to get baseline performance
    root@imx6ull14x14evk:~# openssl speed sha256
    Doing sha256 for 3s on 16 size blocks: 635770 sha256's in 3.00s
    Doing sha256 for 3s on 64 size blocks: 364232 sha256's in 3.00s
    Doing sha256 for 3s on 256 size blocks: 161926 sha256's in 3.00s
    Doing sha256 for 3s on 1024 size blocks: 50296 sha256's in 2.99s
    Doing sha256 for 3s on 8192 size blocks: 6761 sha256's in 3.00s
    OpenSSL 1.0.2h 3 May 2016
    built on: reproducible build, date unspecified
    options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
    compiler: arm-poky-linux-gnueabi-gcc -march=armv7ve -mfpu=neon -mfloat-abi=hard -mcpu=cortex-a7 --sysroot=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/sysroots/imx6ul7d -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O2 -pipe -g -feliminate-unused-debug-types -fdebug-prefix-map=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/work/cortexa7hf-neon-poky-linux-gnueabi/openssl/1.0.2h-r0=/usr/src/debug/openssl/1.0.2h-r0 -fdebug-prefix-map=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/sysroots/x86_64-linux= -fdebug-prefix-map=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/sysroots/imx6ul7d= -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    sha256 3390.77k 7770.28k 13817.69k 17225.12k 18462.04k
  3. Load cryptodev driver:
    root@imx6ull14x14evk:~# modprobe cryptodev
    cryptodev: driver 1.8 loaded.
  4. Now run OpenSSL speed test with the cryptodev engine
    root@imx6ull14x14evk:~# openssl speed sha256 -engine cryptodev
    engine "cryptodev" set.
    Doing sha256 for 3s on 16 size blocks: 641864 sha256's in 3.00s
    Doing sha256 for 3s on 64 size blocks: 364203 sha256's in 3.00s
    Doing sha256 for 3s on 256 size blocks: 161940 sha256's in 3.00s
    Doing sha256 for 3s on 1024 size blocks: 50285 sha256's in 3.00s
    Doing sha256 for 3s on 8192 size blocks: 6762 sha256's in 3.00s
    OpenSSL 1.0.2h 3 May 2016
    built on: reproducible build, date unspecified
    options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
    compiler: arm-poky-linux-gnueabi-gcc -march=armv7ve -mfpu=neon -mfloat-abi=hard -mcpu=cortex-a7 --sysroot=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/sysroots/imx6ul7d -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O2 -pipe -g -feliminate-unused-debug-types -fdebug-prefix-map=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/work/cortexa7hf-neon-poky-linux-gnueabi/openssl/1.0.2h-r0=/usr/src/debug/openssl/1.0.2h-r0 -fdebug-prefix-map=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/sysroots/x86_64-linux= -fdebug-prefix-map=/home/bamboo/build/4.1.X-2.0.0_ga/fsl-imx-x11/temp_build_dir/build_fsl-imx-x11/tmp/sysroots/imx6ul7d= -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    sha256 3423.27k 7769.66k 13818.88k 17163.95k 18464.77k
  5. Note that results in steps (2) and (4) are comparable.  Also check DCP interrupts, note abnormally low DCP interrupt count - only 65!
    root@imx6ull14x14evk:~# cat /proc/interrupts | grep dcp
    236: 0 GPC 46 Level dcp-vmi-irq
    237: 65 GPC 47 Level dcp-irq

If I perform the same test for AES-128 or SHA-1 I get reasonable results.  For example, SHA-1 produces

  • Software mode (no cryptodev):
    root@imx6ull14x14evk:~# openssl speed sha1
    Doing sha1 for 3s on 16 size blocks: 489675 sha1's in 3.00s
    Doing sha1 for 3s on 64 size blocks: 371930 sha1's in 2.99s
    Doing sha1 for 3s on 256 size blocks: 220473 sha1's in 3.00s
    Doing sha1 for 3s on 1024 size blocks: 82921 sha1's in 3.00s
    Doing sha1 for 3s on 8192 size blocks: 12197 sha1's in 3.00s
    OpenSSL 1.0.2h 3 May 2016
    ...

    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    sha1 2611.60k 7961.04k 18813.70k 28303.70k 33305.94k
  • With cryptodev:
    root@imx6ull14x14evk:~# modprobe cryptodev
    cryptodev: driver 1.8 loaded.
    root@imx6ull14x14evk:~# openssl speed sha1 -engine cryptodev
    engine "cryptodev" set.
    Doing sha1 for 3s on 16 size blocks: 16345 sha1's in 0.25s
    Doing sha1 for 3s on 64 size blocks: 19929 sha1's in 0.35s
    Doing sha1 for 3s on 256 size blocks: 18929 sha1's in 0.28s
    Doing sha1 for 3s on 1024 size blocks: 14448 sha1's in 0.30s
    Doing sha1 for 3s on 8192 size blocks: 7307 sha1's in 0.26s
    OpenSSL 1.0.2h 3 May 2016
    ...
    The 'numbers' are in 1000s of bytes per second processed.

    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    sha1 1046.08k 3644.16k 17306.51k 49315.84k 230226.71k
    Note performance gains for 1024 and 8192 block sizes.  Also check DCP interrupt and observe that the count is reasonable:
    root@imx6ull14x14evk:~# cat /proc/interrupts | grep dcp
    236: 0 GPC 46 Level dcp-vmi-irq
    237: 84330 GPC 47 Level dcp-irq

What's going on?

 

 

Outcomes