When using mbedTLS to setup a TLS connection, and I want to make use of the DCP functionality of the i.MX RT1052 MCU to perform AES and SHA256 calculations in hardware I need to disable the data cache in order for the calculations to be correct. However, I cannot just disable the whole data cache for obvious performance reasons, so what are my options to use HW acceleration for a TLS connection?
- Adapt the used DCP AES and SHA functions so that they invalidate/clean parts of the cache that they are using? Sounds like an error-prone option that I would like to avoid.
Any help is appreciated,
When DCP access SDRAM: DMA - SIM_M7 - SEMC - SDRAM.
When DCP access TCM: DMA - SIM_M7 - AXI to AHB - TCM( ITCM/DTCM )
When DCP access OCRAM: DMA - SIM_M7 - Controller OCRAM
In all these cases I-CACHE or D-CACHE are not affected.
The problem possibly comes when DCP (DMA) modify data in OCRAM or SDRAM memory when there is cache on.
So you do not have to disable cache when you use DCP. You need to check whether DCP writes to caheable area or not. There must not be cache on the area where DCP writes to.
There is no problem when DCP writes to TCM. This area is not cacheable.
Thank you for your reply, instead of making sure the DCP only accesses non-cacheable memory I'm trying the following: (FYI: In this reply I will be referring to source code found in fsl_dcp.[hc] found in the SDK version 2.5.0 of the i.mx RT 1050 board.)
I've adapted dcp_schedule_work to clean or invalidate the data cache lines wrt to the dcpPacket, it cleans or invalidates the memory area of the packet itself, and the memory areas pointed to by the pointers contained in the dcpPacket, namely nextCmdAddress, sourceBufferAddres, destinationBufferAddress & payloadPointer. I've pasted the definition of the _dcp_work_packet struct for your reference.
/*! @brief DCP's work packet. */
typedef struct _dcp_work_packet
Can this approach ever work? And if not, why not? Am I looking over something? FYI, when running the mbedTLS_selftest from SDK 2.5.0 this approach seems to work well, except it fails when running the hmac_drbg_self_test for the first time, susbsequent runs pass.
I like this centralized handling of the cache, because then the DCP can be used by client code without worrying about cache coherence.
(added code in bold)
static status_t dcp_schedule_work(DCP_Type *base, dcp_handle_t *handle, dcp_work_packet_t *dcpPacket)
/* check if our channel is active */
if ((base->STAT & (uint32_t)handle->channel) != handle->channel)
/* disable global interrupt */
uint32_t currPriMask = DisableGlobalIRQ();
dcp_clean_dcpPacket(dcpPacket); /* Clean data cache with regard to dcp packet itself and its contents */
/* re-check if our channel is still available */
if ((base->STAT & (uint32_t)handle->channel) == 0)
volatile uint32_t *cmdptr = NULL;
volatile uint32_t *chsema = NULL;
cmdptr = &base->CH0CMDPTR;
chsema = &base->CH0SEMA;
cmdptr = &base->CH1CMDPTR;
chsema = &base->CH1SEMA;
cmdptr = &base->CH2CMDPTR;
chsema = &base->CH2SEMA;
cmdptr = &base->CH3CMDPTR;
chsema = &base->CH3SEMA;
if (cmdptr && chsema)
/* set out packet to DCP CMDPTR */
*cmdptr = (uint32_t)dcpPacket;
/* set the channel semaphore */
*chsema = 1u;
status = kStatus_Success;
status = kStatus_DCP_Again;
/* global interrupt enable */
dcp_invalidate_dcpPacket(dcpPacket); /* Invalidate data cache with regard to dcp packet itself and its contents */
I'm not sure how you use it. But it should always be careful to put data in OCRAM and SDRAM. And if you focus on performance, you should evaluate the benefit of putting data in OCRAM and disable Dcache frequently.