Hi all,
MX6Q is used in our project to implement some computer vision algorithms.
So we are very concentrated on the floating point performance of MX6Q.
The platform is our customized platform with 1GB DDR3 memory.
The test bechmark is nbench-byte-2.2.3.tar.gz.(Linux/Unix nbench )
With hardfp test, we use a hardfp rootfs from Debian.
(1) First, we evaluate the hardfp performance with Debian-hf rootfs and native gcc (4.6.3) compilation and the following CFLAGS to build nbench.
CFLAGS = -s -static -Wall -O3 -march=armv7-a -mtune=cortex-a9 -mfpu=neon -mfloat-abi=hard
Here is the information of gcc used to build nbench application.
root@debian-armhf: apt-get install binutils gcc
root@debian-armhf: gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.6/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.3-14' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard --with-mode=thumb --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 4.6.3 (Debian 4.6.3-14)
Here is the score/result of the native hardfp version of nbench:
root@debian-armhf:/home/float/nbench-byte-pc# ./nbench
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 526.8 : 13.51 : 4.44
STRING SORT : 60.88 : 27.20 : 4.21
BITFIELD : 1.6582e+08 : 28.44 : 5.94
FP EMULATION : 67.413 : 32.35 : 7.46
FOURIER : 6146.9 : 6.99 : 3.93
ASSIGNMENT : 7.6712 : 29.19 : 7.57
IDEA : 1490.5 : 22.80 : 6.77
HUFFMAN : 771.06 : 21.38 : 6.83
NEURAL NET : 8.4347 : 13.55 : 5.70
LU DECOMPOSITION : 295.52 : 15.31 : 11.05
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 24.164
FLOATING-POINT INDEX: 11.319
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU
L2 Cache :
OS : Linux 3.0.35sensor
C compiler : gcc version 4.6.3 (Debian 4.6.3-14)
libc : libc-2.13.so
MEMORY INDEX : 5.743
INTEGER INDEX : 6.255
FLOATING-POINT INDEX: 6.278
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder
(2) with ubuntu armhf rootfs:
ubuntu@ubuntu-armhf:~/test/nbench-byte-2.2.3$ uname -a
Linux ubuntu-armhf 3.14.28-rt25-1.0.0_ga-132797-g4da02de-dirty #28 SMP PREEMPT RT Sat Oct 17 17:35:31 CST 2015 armv7l armv7l armv7l GNU/Linux
ubuntu@ubuntu-armhf:~/test/nbench-byte-2.2.3$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.6/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
CFLAGS = -s -static -Wall -O3 -march=armv7-a -mtune=cortex-a9 -mfpu=neon -mfloat-abi=hard
ubuntu@ubuntu-armhf:~/test/nbench-byte-2.2.3$ ./nbench
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 479.2 : 12.29 : 4.04
STRING SORT : 62.055 : 27.73 : 4.29
BITFIELD : 1.6514e+08 : 28.33 : 5.92
FP EMULATION : 72.022 : 34.56 : 7.97
FOURIER : 6580.5 : 7.48 : 4.20
ASSIGNMENT : 7.2465 : 27.57 : 7.15
IDEA : 1507.9 : 23.06 : 6.85
HUFFMAN : 816.73 : 22.65 : 7.23
NEURAL NET : 8.7615 : 14.07 : 5.92
LU DECOMPOSITION : 296.56 : 15.36 : 11.09
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 24.160
FLOATING-POINT INDEX: 11.740
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU ARMv7 Processor rev 10 (v7l)
L2 Cache :
OS : Linux 3.14.28-rt25-1.0.0_ga-132797-g4da02de-dirty
C compiler : gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
libc : libc-2.15.so
MEMORY INDEX : 5.663
INTEGER INDEX : 6.319
FLOATING-POINT INDEX: 6.511
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
ubuntu@ubuntu-armhf:~/test/nbench-byte-2.2.3$ readelf -A nbench
Attribute Section: aeabi
File Attributes
Tag_CPU_name: "7-A"
Tag_CPU_arch: v7
Tag_CPU_arch_profile: Application
Tag_ARM_ISA_use: Yes
Tag_THUMB_ISA_use: Thumb-2
Tag_FP_arch: VFPv3
Tag_Advanced_SIMD_arch: NEONv1
Tag_ABI_PCS_wchar_t: 4
Tag_ABI_FP_denormal: Needed
Tag_ABI_FP_exceptions: Needed
Tag_ABI_FP_number_model: IEEE 754
Tag_ABI_align_needed: 8-byte
Tag_ABI_align_preserved: 8-byte, except leaf SP
Tag_ABI_enum_size: int
Tag_ABI_HardFP_use: SP and DP
Tag_ABI_VFP_args: VFP registers
Tag_CPU_unaligned_access: v6
Tag_DIV_use: Not allowed
" Tag_ABI_VFP_args: VFP registers" shows that hardfp instructions are used.
(3) Freescale also provides a cross-compiling toolchain (gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415) to build hardfp verison of nbech:
root@debian-armhf :
/opt/freescale/gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415_linux/bin/arm-linux-gnueabihf-gcc -v
Using built-in specs.
COLLECT_GCC=/home/percy/project/tools/gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415_linux/bin/arm-linux-gnueabihf-gcc
COLLECT_LTO_WRAPPER=/home/percy/project/tools/gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415_linux/bin/../libexec/gcc/arm-linux-gnueabihf/4.7.3/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: /cbuild/slaves/oorts/crosstool-ng/builds/arm-linux-gnueabihf-linux/.build/src/gcc-linaro-4.7-2013.04/configure --build=i686-build_pc-linux-gnu --host=i686-build_pc-linux-gnu --target=arm-linux-gnueabihf --prefix=/cbuild/slaves/oorts/crosstool-ng/builds/arm-linux-gnueabihf-linux/install --with-sysroot=/cbuild/slaves/oorts/crosstool-ng/builds/arm-linux-gnueabihf-linux/install/arm-linux-gnueabihf/libc --enable-languages=c,c++,fortran --enable-multilib --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard --with-pkgversion='crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04' --with-bugurl=https://bugs.launchpad.net/gcc-linaro --enable-__cxa_atexit --enable-libmudflap --enable-libgomp --enable-libssp --with-gmp=/cbuild/slaves/oorts/crosstool-ng/builds/arm-linux-gnueabihf-linux/.build/arm-linux-gnueabihf/build/static --with-mpfr=/cbuild/slaves/oorts/crosstool-ng/builds/arm-linux-gnueabihf-linux/.build/arm-linux-gnueabihf/build/static --with-mpc=/cbuild/slaves/oorts/crosstool-ng/builds/arm-linux-gnueabihf-linux/.build/arm-linux-gnueabihf/build/static --with-ppl=/cbuild/slaves/oorts/crosstool-ng/builds/arm-linux-gnueabihf-linux/.build/arm-linux-gnueabihf/build/static --with-cloog=/cbuild/slaves/oorts/crosstool-ng/builds/arm-linux-gnueabihf-linux/.build/arm-linux-gnueabihf/build/static --with-libelf=/cbuild/slaves/oorts/crosstool-ng/builds/arm-linux-gnueabihf-linux/.build/arm-linux-gnueabihf/build/static --with-host-libstdcxx='-L/cbuild/slaves/oorts/crosstool-ng/builds/arm-linux-gnueabihf-linux/.build/arm-linux-gnueabihf/build/static/lib -lpwl' --enable-threads=posix --disable-libstdcxx-pch --enable-linker-build-id --enable-gold --with-local-prefix=/cbuild/slaves/oorts/crosstool-ng/builds/arm-linux-gnueabihf-linux/install/arm-linux-gnueabihf/libc --enable-c99 --enable-long-long --with-mode=thumb
Thread model: posix
gcc version 4.7.3 20130328 (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04)
The CFLAGS used to build nbench is :
CFLAGS = -s -static -Wall -O3 -march=armv7-a -mtune=cortex-a9 -mfpu=neon -mfloat-abi=hard
Here is the nbench score:
root@debian-armhf:/home/float/nbench-byte-hf# ./nbench
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 507.28 : 13.01 : 4.27
STRING SORT : 63.044 : 28.17 : 4.36
BITFIELD : 1.2428e+08 : 21.32 : 4.45
FP EMULATION : 68.49 : 32.86 : 7.58
FOURIER : 6720.8 : 7.64 : 4.29
ASSIGNMENT : 7.1967 : 27.38 : 7.10
IDEA : 1591.1 : 24.34 : 7.23
HUFFMAN : 780 : 21.63 : 6.91
NEURAL NET : 9.0302 : 14.51 : 6.10
LU DECOMPOSITION : 306.96 : 15.90 : 11.48
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 23.276
FLOATING-POINT INDEX: 12.081
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU
L2 Cache :
OS : Linux 3.0.35sensor
C compiler : /home/percy/project/tools/gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415_linux/bin/arm-linux-gnueabihf-gcc
libc : static
MEMORY INDEX : 5.166
INTEGER INDEX : 6.341
FLOATING-POINT INDEX: 6.700
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
(4) Using Freescale's cross-compiling toolchaing to build softfp version of nbench:
Here the rootfs is built from LTIB-3.0.35.
root@freescale:
/opt/freescale/usr/local/gcc-4.6.2-glibc-2.13-linaro-multilib-2011.12/fsl-linaro-toolchain/bin/arm-linux-gcc -v
Using built-in specs.
COLLECT_GCC=/opt/freescale/usr/local/gcc-4.6.2-glibc-2.13-linaro-multilib-2011.12/fsl-linaro-toolchain/bin/arm-linux-gcc
COLLECT_LTO_WRAPPER=/opt/freescale/usr/local/gcc-4.6.2-glibc-2.13-linaro-multilib-2011.12/fsl-linaro-toolchain/bin/../libexec/gcc/arm-fsl-linux-gnueabi/4.6.2/lto-wrapper
Target: arm-fsl-linux-gnueabi
Configured with: /work/build/.build/src/gcc-linaro-4.6-2011.06-0/configure --build=i686-build_pc-linux-gnu --host=i686-build_pc-linux-gnu --target=arm-fsl-linux-gnueabi --prefix=/work/fsl-linaro-toolchain-2.13 --with-sysroot=/work/fsl-linaro-toolchain-2.13/arm-fsl-linux-gnueabi/multi-libs --enable-languages=c,c++ --with-pkgversion='Freescale MAD -- Linaro 2011.07 -- Built at 2011/08/10 09:20' --enable-__cxa_atexit --disable-libmudflap --disable-libgomp --disable-libssp --with-gmp=/work/build/.build/arm-fsl-linux-gnueabi/build/static --with-mpfr=/work/build/.build/arm-fsl-linux-gnueabi/build/static --with-mpc=/work/build/.build/arm-fsl-linux-gnueabi/build/static --with-ppl=/work/build/.build/arm-fsl-linux-gnueabi/build/static --with-cloog=/work/build/.build/arm-fsl-linux-gnueabi/build/static --with-libelf=/work/build/.build/arm-fsl-linux-gnueabi/build/static --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm -L/work/build/.build/arm-fsl-linux-gnueabi/build/static/lib -lpwl' --enable-threads=posix --enable-target-optspace --enable-plugin --enable-multilib --with-local-prefix=/work/fsl-linaro-toolchain-2.13/arm-fsl-linux-gnueabi/multi-libs --disable-nls --enable-c99 --enable-long-long --with-system-zlib
Thread model: posix
gcc version 4.6.2 20110630 (prerelease) (Freescale MAD -- Linaro 2011.07 -- Built at 2011/08/10 09:20)
percy@percy-virtual-machine:gcc-4.6.2-glibc-2.13-linaro-multilib-2011.12$
nbench CFLAGS = -s -static -Wall -O3 -march=armv7-a -mtune=cortex-a9 -mfpu=neon -mfloat-abi=softfp
Here is the nbench score:
root@freescale:/home/float/nbench-byte-sf# ./nbench
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 532.08 : 13.65 : 4.48
STRING SORT : 62.19 : 27.79 : 4.30
BITFIELD : 1.9527e+08 : 33.50 : 7.00
FP EMULATION : 88.005 : 42.23 : 9.74
FOURIER : 6905.9 : 7.85 : 4.41
ASSIGNMENT : 7.6802 : 29.22 : 7.58
IDEA : 1301.7 : 19.91 : 5.91
HUFFMAN : 878.24 : 24.35 : 7.78
NEURAL NET : 8.8688 : 14.25 : 5.99
LU DECOMPOSITION : 305.28 : 15.82 : 11.42
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 25.796
FLOATING-POINT INDEX: 12.095
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : 4 CPU
L2 Cache :
OS : Linux 3.0.35sensor
C compiler : /opt/freescale/usr/local/gcc-4.6.2-glibc-2.13-linaro-multilib-2011.12/fsl-linaro-toolchain/bin/arm-linux-gcc
libc : static
MEMORY INDEX : 6.110
INTEGER INDEX : 6.694
FLOATING-POINT INDEX: 6.708
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
From above results, the nbench hardfp performance is almost the same as the softfp.
In theory, hardfp performance should be much better (about 20%) than softfp.
How to explain the test results?
Am I correct to enable the hardfp with the correct gcc CFLAGS?
What is the peak hardfp/softfp performance of MX6Q?
Robbie
Hi Robbie
in general, if there are no public data on freescale official web
site, performance data can be requested from local marketing.
Some basic performance data (like Basic float operations) can be obtained
using LMBench on
https://community.freescale.com/docs/DOC-94571
useful OpenCV link
http://imxcv.blogspot.mx/2014/02/building-opencv-24x-for-freescales-imx6.html
Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------