I was indeed using an older IE and SDK (11.0.0 and 2.6.2). I now downloaded the newest (11.1.0 and 2.7.0 which already uses mbedTLS 2.16.2). However, the results still do not make sense to me:
DEBUG:
Asymmetric encryption: CASPER HW accelerated
ECDSA-secp256r1 : 5.00 sign/s
ECDSA-secp256r1 : 4.67 verify/s
Asymmetric encryption: Software implementation (DEBUG)
ECDSA-secp256r1 : 5.00 sign/s
ECDSA-secp256r1 : 4.67 verify/s
RELEASE:
Asymmetric encryption: CASPER HW accelerated
ECDSA-secp256r1 : 5.33 sign/s
ECDSA-secp256r1 : 5.00 verify/s
Asymmetric encryption: Software implementation
ECDSA-secp256r1 : 5.67 sign/s
ECDSA-secp256r1 : 5.33 verify/s
Hardware or software implementation makes basically no difference. I tried various optimization settings which didn't change much. To me, it makes sense that (at least in my version of the SDK) there is not much difference between DEBUG and RELEASE and between various optimization settings, as the compute intensive part is done in hardware which should not be affected by those settings. However, Hardware vs. Software Implementation should make a difference.
I debugged the application and found out that in both cases (HW and SW), various CASPER functions are called. Defining or uncommenting MBEDTLS_FREESCALE_CASPER_PKHA does not seem to make a difference. But if the explanation for the similar results would be that CASPER is running also when it should not, the results in AN12445 Table 8 are still more than 2 times better than my results in any build/optimization. I expect the results to be slightly different as AN12445 war used with IAR instead of MCUXpresso IDE, but I would not expect the results to be more than 2 times better for a mostly HW operation.