iMX8MM CAAM errors when using 'tk(cbc(aes))' for filesystem encryption

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

iMX8MM CAAM errors when using 'tk(cbc(aes))' for filesystem encryption

1,578 Views
djs2
Contributor II

I'm getting the following error when writing to a filesystem using the CAAM for filesystem encryption with `tk(cbc(aes))`.

caam_jr 30902000.jr: 4000141c: DECO: desc idx 20: DECO Watchdog timer timeout error

This only happens occasionally but seems to be more prevalent when running with all cores enabled.

0 Kudos
Reply
9 Replies

1,304 Views
djs2
Contributor II

Unfortunately, I've been unable to reproduce with other tools

I have added logging of last 2048 CAAM log messages on failure and we are now waiting for failure to re-occur in CI.  I'll send logs as soon as I get them

0 Kudos
Reply

1,525 Views
Harvey021
NXP TechSupport
NXP TechSupport

Can you please share the version of BSP you're working and the Steps and logs when problems occur?

 

Regards

Harvey

0 Kudos
Reply

1,379 Views
djs2
Contributor II

Sorry for the delay in replying.

We are using `linux-imx_5.15.71_2.2.2-phy5` from Phytec with patches from https://github.com/Freescale/linux-fslc/tree/5.15-2.2.x-imx up until 5.15.183.  Unfortunately the problem only occurs occasionally (less than 1 instance every 500 hours or so of CI testing across multiple units) and I haven't been able to create a simple reproducer.

An initial attempt to enable `CONFIG_CRYPTO_DEV_FSL_CAAM_DEBUG` prevents our device from booting as we are using the CAAM to encrypt the root filesystem along with various data partitions and this generates too much logging.

I'm looking at adding log information to a circular buffer and emitting this when the error occurs.  As this will only result in the last 1000 or so records being emitted, I'd like to know whether there are any setup messages that we should always log to support analysis.

Thanks

Daniel

0 Kudos
Reply

1,309 Views
Harvey021
NXP TechSupport
NXP TechSupport

Watchdog timeout error was triggered by DECO halt on but there were multi case to make DECO halt on, such as input/output buffer address, length or etc.
Can you reproduce this with stress test with "dd" or "fio" tool?

If the issue can be reproduced stably, it can help us to find the root cause.

 

Regards

Harvey

0 Kudos
Reply

1,129 Views
djs2
Contributor II

Finally had it fail with logging.  This should include the last 2048 log records from the CAAM subsystem.  Only difference to standard logging is that `src` and `dst` buffer data is not included.

0 Kudos
Reply

531 Views
AldoG
NXP TechSupport
NXP TechSupport

Hello,

were you able to reproduce using dd or fio as harvey recommended?
In your last test how easy is to reproduce?

Best regards/Saludos,
Aldo.

0 Kudos
Reply

317 Views
djs2
Contributor II

No, I haven't been able to reproduce with dd or fio.  On the system I captured the previous log from, it only happens very occasionally (once in 2 months so far). 

On another system it with slightly different code, it happens at least once a day.  We believe this is when loading a large set of shared libraries during startup (which aren't used on the system I got logs from).  Unfortunately, we are not able to easily collect logs from this version - however we would be able to test a patch relatively quickly to see if the issue is resolved.

Do the previous logs contain enough information for investigation purposes?  If not then what else would be required?

0 Kudos
Reply

310 Views
djs2
Contributor II

From a brief look at my log, it appears that at the point of failure, the 8th of 8 queued requests for a sequence of offsets is what generates the DECO watchdog timeout error.  In the earlier portions of the log, it appears that there are rarely any queued requests (possibly sometimes one?) even when handling other sequences of offsets.  Is this a clue?

The 7 queued requests before this do seem to complete correctly so could one of the following the cause...

  1. The queue actually only supports 7 entries - in which case reducing the number of queued entries may help (where can I change this?)
  2. The DECO watchdog timeout starts when entries are added to the queue and simply expires due to the time taken to handle 8 entries - in which case extending the timeout period may help (again, if possible, where can I change this?)
  3. This specific request actually has a problem - but to me it looks equivalent to the 7 previous requests so this seems unlikely

Thanks

Daniel

 

0 Kudos
Reply

309 Views
djs2
Contributor II

Note we are running the CPU and DDR at reduced speed for power saving reasons - which may impact this

0 Kudos
Reply
%3CLINGO-SUB%20id%3D%22lingo-sub-2184884%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3EiMX8MM%20CAAM%20errors%20when%20using%20'tk(cbc(aes))'%20for%20filesystem%20encryption%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2184884%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3EI'm%20getting%20the%20following%20error%20when%20writing%20to%20a%20filesystem%20using%20the%20CAAM%20for%20filesystem%20encryption%20with%20%60%3CSPAN%3Etk(cbc(aes))%60.%3C%2FSPAN%3E%3C%2FP%3E%3CBLOCKQUOTE%3E%3CP%3E%3CSPAN%3Ecaam_jr%2030902000.jr%3A%204000141c%3A%20DECO%3A%20desc%20idx%2020%3A%20DECO%20Watchdog%20timer%20timeout%20error%3C%2FSPAN%3E%3C%2FP%3E%3C%2FBLOCKQUOTE%3E%3CP%3E%3CSPAN%3EThis%20only%20happens%20occasionally%20but%20seems%20to%20be%20more%20prevalent%20when%20running%20with%20all%20cores%20enabled.%3C%2FSPAN%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2202598%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%20translate%3D%22no%22%3ERe%3A%20iMX8MM%20CAAM%20errors%20when%20using%20'tk(cbc(aes))'%20for%20filesystem%20encryption%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2202598%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3ESorry%20for%20the%20delay%20in%20replying.%3C%2FP%3E%3CP%3EWe%20are%20using%20%60linux-imx_5.15.71_2.2.2-phy5%60%20from%20Phytec%20with%20patches%20from%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Fgithub.com%2FFreescale%2Flinux-fslc%2Ftree%2F5.15-2.2.x-imx%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noreferrer%22%3Ehttps%3A%2F%2Fgithub.com%2FFreescale%2Flinux-fslc%2Ftree%2F5.15-2.2.x-imx%3C%2FA%3E%26nbsp%3Bup%20until%205.15.183.%26nbsp%3B%20Unfortunately%20the%20problem%20only%20occurs%20occasionally%20(less%20than%201%20instance%20every%20500%20hours%20or%20so%20of%20CI%20testing%20across%20multiple%20units)%20and%20I%20haven't%20been%20able%20to%20create%20a%20simple%20reproducer.%3C%2FP%3E%3CP%3EAn%20initial%20attempt%20to%20enable%20%60%3CSPAN%3ECONFIG_CRYPTO_DEV_FSL_CAAM_DEBUG%60%20prevents%20our%20device%20from%20booting%20as%20we%20are%20using%20the%20CAAM%20to%20encrypt%20the%20root%20filesystem%20along%20with%20various%20data%20partitions%20and%20this%20generates%20too%20much%20logging.%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%3CSPAN%3EI'm%20looking%20at%20adding%20log%20information%20to%20a%20circular%20buffer%20and%20emitting%20this%20when%20the%20error%20occurs.%26nbsp%3B%20As%20this%20will%20only%20result%20in%20the%20last%201000%20or%20so%20records%20being%20emitted%2C%20I'd%20like%20to%20know%20whether%20there%20are%20any%20setup%20messages%20that%20we%20should%20always%20log%20to%20support%20analysis.%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%3CSPAN%3EThanks%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%3CSPAN%3EDaniel%3C%2FSPAN%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2187937%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%20translate%3D%22no%22%3ERe%3A%20iMX8MM%20CAAM%20errors%20when%20using%20'tk(cbc(aes))'%20for%20filesystem%20encryption%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2187937%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3ECan%20you%20please%20share%20the%20version%20of%20BSP%20you're%20working%20and%20the%26nbsp%3B%3CSPAN%3ESteps%20and%20logs%20when%20problems%20occur%3F%3C%2FSPAN%3E%3C%2FP%3E%0A%3CBR%20%2F%3E%0A%3CP%3E%3CSPAN%3ERegards%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3EHarvey%3C%2FSPAN%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2249281%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%20translate%3D%22no%22%3ERe%3A%20iMX8MM%20CAAM%20errors%20when%20using%20'tk(cbc(aes))'%20for%20filesystem%20encryption%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2249281%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3EUnfortunately%2C%20I've%20been%20unable%20to%20reproduce%20with%20other%20tools%20%3CLI-EMOJI%20id%3D%22lia_disappointed-face%22%20title%3D%22%3Adisappointed_face%3A%22%3E%3C%2FLI-EMOJI%3E%3C%2FP%3E%3CP%3EI%20have%20added%20logging%20of%20last%202048%20CAAM%20log%20messages%20on%20failure%20and%20we%20are%20now%20waiting%20for%20failure%20to%20re-occur%20in%20CI.%26nbsp%3B%20I'll%20send%20logs%20as%20soon%20as%20I%20get%20them%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2249270%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%20translate%3D%22no%22%3ERe%3A%20iMX8MM%20CAAM%20errors%20when%20using%20'tk(cbc(aes))'%20for%20filesystem%20encryption%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2249270%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3EWatchdog%20timeout%20error%20was%20triggered%20by%20DECO%20halt%20on%20but%20there%20were%20multi%20case%20to%20make%20DECO%20halt%20on%2C%20such%20as%20input%2Foutput%20buffer%20address%2C%20length%20or%20etc.%3CBR%20%2F%3ECan%20you%20reproduce%20this%20with%20stress%20test%20with%20%22dd%22%20or%20%22fio%22%20tool%3F%3C%2FP%3E%0A%3CP%3EIf%20the%20issue%20can%20be%20reproduced%20stably%2C%20it%20can%20help%20us%20to%20find%20the%20root%20cause.%3C%2FP%3E%0A%3CBR%20%2F%3E%0A%3CP%3ERegards%3C%2FP%3E%0A%3CP%3EHarvey%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2263505%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%20translate%3D%22no%22%3ERe%3A%20iMX8MM%20CAAM%20errors%20when%20using%20'tk(cbc(aes))'%20for%20filesystem%20encryption%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2263505%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3EFinally%20had%20it%20fail%20with%20logging.%26nbsp%3B%20This%20should%20include%20the%20last%202048%20log%20records%20from%20the%20CAAM%20subsystem.%26nbsp%3B%20Only%20difference%20to%20standard%20logging%20is%20that%20%60src%60%20and%20%60dst%60%20buffer%20data%20is%20not%20included.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2291085%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%20translate%3D%22no%22%3ERe%3A%20iMX8MM%20CAAM%20errors%20when%20using%20'tk(cbc(aes))'%20for%20filesystem%20encryption%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2291085%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3EFrom%20a%20brief%20look%20at%20my%20log%2C%20it%20appears%20that%20at%20the%20point%20of%20failure%2C%20the%208th%20of%208%20queued%20requests%20for%20a%20sequence%20of%20offsets%20is%20what%20generates%20the%20DECO%20watchdog%20timeout%20error.%26nbsp%3B%20In%20the%20earlier%20portions%20of%20the%20log%2C%20it%20appears%20that%20there%20are%20rarely%20any%20queued%20requests%20(possibly%20sometimes%20one%3F)%20even%20when%20handling%20other%20sequences%20of%20offsets.%26nbsp%3B%20Is%20this%20a%20clue%3F%3C%2FP%3E%3CP%3EThe%207%20queued%20requests%20before%20this%20do%20seem%20to%20complete%20correctly%20so%20could%20one%20of%20the%20following%20the%20cause...%3C%2FP%3E%3COL%3E%3CLI%3EThe%20queue%20actually%20only%20supports%207%20entries%20-%20in%20which%20case%20reducing%20the%20number%20of%20queued%20entries%20may%20help%20(where%20can%20I%20change%20this%3F)%3C%2FLI%3E%3CLI%3EThe%20DECO%20watchdog%20timeout%20starts%20when%20entries%20are%20added%20to%20the%20queue%20and%20simply%20expires%20due%20to%20the%20time%20taken%20to%20handle%208%20entries%20-%20in%20which%20case%20extending%20the%20timeout%20period%20may%20help%20(again%2C%20if%20possible%2C%20where%20can%20I%20change%20this%3F)%3C%2FLI%3E%3CLI%3EThis%20specific%20request%20actually%20has%20a%20problem%20-%20but%20to%20me%20it%20looks%20equivalent%20to%20the%207%20previous%20requests%20so%20this%20seems%20unlikely%3C%2FLI%3E%3C%2FOL%3E%3CP%3EThanks%3C%2FP%3E%3CP%3EDaniel%3C%2FP%3E%3CBR%20%2F%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2291088%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%20translate%3D%22no%22%3ERe%3A%20iMX8MM%20CAAM%20errors%20when%20using%20'tk(cbc(aes))'%20for%20filesystem%20encryption%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2291088%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3ENote%20we%20are%20running%20the%20CPU%20and%20DDR%20at%20reduced%20speed%20for%20power%20saving%20reasons%20-%20which%20may%20impact%20this%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2291071%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%20translate%3D%22no%22%3ERe%3A%20iMX8MM%20CAAM%20errors%20when%20using%20'tk(cbc(aes))'%20for%20filesystem%20encryption%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2291071%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3ENo%2C%20I%20haven't%20been%20able%20to%20reproduce%20with%20dd%20or%20fio.%26nbsp%3B%20On%20the%20system%20I%20captured%20the%20previous%20log%20from%2C%20it%20only%20happens%20very%20occasionally%20(once%20in%202%20months%20so%20far).%26nbsp%3B%3C%2FP%3E%3CP%3EOn%20another%20system%20it%20with%20slightly%20different%20code%2C%20it%20happens%20at%20least%20once%20a%20day.%26nbsp%3B%20We%20believe%20this%20is%20when%20loading%20a%20large%20set%20of%20shared%20libraries%20during%20startup%20(which%20aren't%20used%20on%20the%20system%20I%20got%20logs%20from).%26nbsp%3B%20Unfortunately%2C%20we%20are%20not%20able%20to%20easily%20collect%20logs%20from%20this%20version%20-%20however%20we%20would%20be%20able%20to%20test%20a%20patch%20relatively%20quickly%20to%20see%20if%20the%20issue%20is%20resolved.%3C%2FP%3E%3CP%3EDo%20the%20previous%20logs%20contain%20enough%20information%20for%20investigation%20purposes%3F%26nbsp%3B%20If%20not%20then%20what%20else%20would%20be%20required%3F%3C%2FP%3E%3C%2FLINGO-BODY%3E