I'm getting CAAM crypto hardware errors when using IPsec VPN and aes encryption:
caam_jr 1720000.jr: 40002d1c: DECO: desc idx 45: DECO Watchdog timer timeout error
Different ipsec clients have been tested with the same result. aes encryption is the only one that causes this. Other methods like 3DES works.
My setup:
VPN client is configured to tunnel traffic from eth2 interface. This error only occurs when trying to ping from the device using eth2 address as the ping source:
ping -I <eth2 address> <internet ping target>
Traffic from eth2 interface is encrypted and goes out from wlan0 interface, only pings originating from the ls1021a with source address causes errors and packets are lost.
Pings and traffic work if eth1 wan is used.
Also if hardware crypto is disabled, no errors and everything works as it should.
Something is happening to the traffic originating from the ls1021a device that causes crypto hardware errors.
Attached are debug output from CAAM. The interesting part is in the beginning of each debug files when first aead_encrypt_done function is called. After failing at encryption, CAAM starts to encrypt again but cryptlen has increased by 80 and continues to fail 13 times before giving up. Cryptlen increases by 80 after each failed attempt.
Same thing happens with eth1 wan when trying to ping with packet size greater than 1400.
ping -s 1400 -I <eth2 address> <internet ping target>
This should be easy to test with any setup with IPsec and CAAM hardware.
1. The "DECO Watchdog timer timeout error" might be caused by a timing issue in the CAAM descriptor used for IPsec encryption.
A fix is available here (currently under review):
crypto: caam - fix concurrency issue in givencrypt descriptor - Patchwork
2. With regards to "After failing at encryption, CAAM starts to encrypt again but cryptlen has increased by 80 and continues to fail 13 times before giving up".
The root cause seems to be the following: CAAM driver, in case of a failure, is not returning the correct error code back to the networking stack. This causes the networking stack to try to encapsulate (IPsec ESP) the resulting packet (bigger than the original one) again and again, until it goes over the MTU size when eventually xfrm gives up.
More exactly: the resume path (from crypto to networking stack) is: esp_output_done() -> xfrm_output_resume() -> xfrm_output_one(..., err) and since err is incorrect (a positive number representing the CAAM HW status instead of a negative errno, for e.g. -EINVAL) xfrm_output_one() does not jump to the "resume" label and re-encapsulates the packet.
A fix is available here (also under review):
[v3,02/14] crypto: caam - fix return code in completion callbacks - Patchwork
I tested a workaround and removed authenc drivers with aes in caamalg.c. Pings and traffic are working normally now also with aes encryption.
Normally hardware crypto (CAAM) is using aead_givencrypt(), aead_encrypt() and aead_decrypt() -functions. If caamalg.c is modified and all authenc drivers with aes are removed CAAM is using ablkcipher and ahash instead of aead and everything works correctly. 3DES and aes-gcm are still being handled by aead-functions.
Driver registered for aes in /proc/crypto is:
driver : authenc(hmac-sha512-caam,cbc-aes-caam)
So, aead for aes is not working in some situations but ablkcipher with ahash works. I have not noticed any difference in performance with the workaround.
Has somebody noticed similar issues with aead and CAAM hardware?