mbedTLS - i.MX RT1052 - DCP - How to handle datacache

mathieu_bordere · ‎01-17-2019

Hi all

When using mbedTLS to setup a TLS connection, and I want to make use of the DCP functionality of the i.MX RT1052 MCU to perform AES and SHA256 calculations in hardware I need to disable the data cache in order for the calculations to be correct. However, I cannot just disable the whole data cache for obvious performance reasons, so what are my options to use HW acceleration for a TLS connection?

- Adapt the used DCP AES and SHA functions so that they invalidate/clean parts of the cache that they are using? Sounds like an error-prone option that I would like to avoid.

Any help is appreciated,

Mathieu

jingpan · ‎01-24-2019

Hi Mathieu,

When DCP access SDRAM: DMA - SIM_M7 - SEMC - SDRAM.

When DCP access TCM: DMA - SIM_M7 - AXI to AHB - TCM( ITCM/DTCM )

When DCP access OCRAM: DMA - SIM_M7 - Controller OCRAM

In all these cases I-CACHE or D-CACHE are not affected.

The problem possibly comes when DCP (DMA) modify data in OCRAM or SDRAM memory when there is cache on.

So you do not have to disable cache when you use DCP. You need to check whether DCP writes to caheable area or not. There must not be cache on the area where DCP writes to.

There is no problem when DCP writes to TCM. This area is not cacheable.

Regards,

Jing

mathieu_bordere · ‎01-24-2019

Hi jingpan,

Thank you for your reply, instead of making sure the DCP only accesses non-cacheable memory I'm trying the following: (FYI: In this reply I will be referring to source code found in fsl_dcp.[hc] found in the SDK version 2.5.0 of the i.mx RT 1050 board.)

I've adapted dcp_schedule_work to clean or invalidate the data cache lines wrt to the dcpPacket, it cleans or invalidates the memory area of the packet itself, and the memory areas pointed to by the pointers contained in the dcpPacket, namely nextCmdAddress, sourceBufferAddres, destinationBufferAddress & payloadPointer. I've pasted the definition of the _dcp_work_packet struct for your reference.

/*! @brief DCP's work packet. */
typedef struct _dcp_work_packet
{
    uint32_t nextCmdAddress;
    uint32_t control0;
    uint32_t control1;
    uint32_t sourceBufferAddress;
    uint32_t destinationBufferAddress;
    uint32_t bufferSize;
    uint32_t payloadPointer;
    uint32_t status;
} dcp_work_packet_t;

Can this approach ever work? And if not, why not? Am I looking over something? FYI, when running the mbedTLS_selftest from SDK 2.5.0 this approach seems to work well, except it fails when running the hmac_drbg_self_test for the first time, susbsequent runs pass.

I like this centralized handling of the cache, because then the DCP can be used by client code without worrying about cache coherence.

(added code in bold)

static status_t dcp_schedule_work(DCP_Type *base, dcp_handle_t *handle, dcp_work_packet_t *dcpPacket)
{
    status_t status;
    /* check if our channel is active */
    if ((base->STAT & (uint32_t)handle->channel) != handle->channel)
    {
        /* disable global interrupt */
        uint32_t currPriMask = DisableGlobalIRQ();
        dcp_clean_dcpPacket(dcpPacket); /* Clean data cache with regard to dcp packet itself and its contents */
        /* re-check if our channel is still available */
        if ((base->STAT & (uint32_t)handle->channel) == 0)
        {
            volatile uint32_t *cmdptr = NULL;
            volatile uint32_t *chsema = NULL;
            switch (handle->channel)
            {
                case kDCP_Channel0:
                    cmdptr = &base->CH0CMDPTR;
                    chsema = &base->CH0SEMA;
                    break;
                case kDCP_Channel1:
                    cmdptr = &base->CH1CMDPTR;
                    chsema = &base->CH1SEMA;
                    break;
                case kDCP_Channel2:
                    cmdptr = &base->CH2CMDPTR;
                    chsema = &base->CH2SEMA;
                    break;
                case kDCP_Channel3:
                    cmdptr = &base->CH3CMDPTR;
                    chsema = &base->CH3SEMA;
                    break;
                default:
                    break;
            }
            if (cmdptr && chsema)
            {
                /* set out packet to DCP CMDPTR */
                *cmdptr = (uint32_t)dcpPacket;
                /* set the channel semaphore */
                *chsema = 1u;
            }
            status = kStatus_Success;
        }
        else
        {
            status = kStatus_DCP_Again;
        }
        /* global interrupt enable */
        dcp_invalidate_dcpPacket(dcpPacket); /* Invalidate data cache with regard to dcp packet itself and its contents */
        EnableGlobalIRQ(currPriMask);
    }
    else
    {
        return kStatus_DCP_Again;
    }
    return status;
}

jingpan · ‎01-30-2019

Hi,

I'm not sure how you use it. But it should always be careful to put data in OCRAM and SDRAM. And if you focus on performance, you should evaluate the benefit of putting data in OCRAM and disable Dcache frequently.

Regards,

Jing