DMA, FEC and D-cache coherency

tedwood · ‎10-19-2006

We are using the MCF5475 with the Freescale supplied example code for using the FEC and DMA. With data cache disabled it works fine. When the cache is enabled it fails (not surprisingly) because cache coherency is not maintained during DMA.

I can see two possible solutions to this problem.

1. Invalidate the cache after DMA to memory and flush it after DMA reads.
2. Place the buffer memory in non-cached memory.

I can see how to do 2. I can't work out from looking at the Freescale code (fec.c fecbd.c) where to flush and invalidate the data cache. Somebody must have been here before surely?

All suggestions welcome.

Cheers
TW

TomE · ‎02-02-2010

tedwood wrote:
We are using the MCF5475 with the Freescale supplied example code for using the FEC and DMA. With data cache disabled it works fine. When the cache is enabled it fails (not surprisingly) because cache coherency is not maintained during DMA.

I can see two possible solutions to this problem.

1. Invalidate the cache after DMA to memory and flush it after DMA reads.
2. Place the buffer memory in non-cached memory.

I can see how to do 2. I can't work out from looking at the Freescale code (fec.c fecbd.c) where to flush and invalidate the data cache. Somebody must have been here before surely?

All suggestions welcome.

Cheers
TW

Here's some more options.

We've found that our code runs FASTER if the cache is configured as WRITETHROUGH than it is when configured as COPYBACK. That solves the problems of transmitting data through the FEC or DMA. You can also use CACR/CINVA with WRITETHROUGH without losing data, but I'm pretty sure CINVA with WRITEBACK would lose writes to memory.

If you need to use cached memory for receiving, then you'll need to invalidate the cache lines that the receive buffers are in before starting the hardware. A simpler solution is to use buffers in STATIC RAM to receive into. You could even have a "quick hack" that receives into SRAM and then let the CPU copy it to where you really want in in cached RAM.

The most universal solution is to use CPUSHL in INVALIDATE mode for RECEIVE buffers (mode doesn't matter for transmitting).

Fun, isn't it?

thz · ‎10-25-2006

same problem here... but on a MCF5234, using split cache configuration.

I'am trying to invalidate the corresponding D-cacheline with CPUSHL.
The problem is, CPUSHL is badly documented... 'til now i'm getting
only trap 11 :smileysad:

cheers, Thomas

Rik · ‎10-20-2006

Hej.

I've a similar problem. Like you say, using a non-cached area is one solution (just yesterday I found out how to partion the SDRAM in a cached and non-cached part).

Invalidating the cache has the big disadvantge that ALL data is gone, that'll slow all other code fetching data down too.

The best would be to just invalidate those cache lines which refer to the DMA'd data, but there seems to be no way to do that. I've written to Freescale about this, no answer yet.

Rik.

KenJohnson · ‎01-27-2010

Here is a solution for anyone still searching. FlushDataCacheRegion uses CPUSHL to flush only those data cache lines that could contain data from a specified memory region. I use this on the 5475 to flush tx buffers from the cache prior to setting the ready bit in the buffer descriptor and to flush rx buffers from the cache prior to setting the empty bit in the buffer descriptor. The buffers must be 16-byte aligned, and the size of the buffers must be evenly divisible by 16.

/**
* Flush and Invalidate specified memory from data cache
*
* Flushes (and invalidates by virtue of CACR[DDPI]=0) specified
* memory range from cache. This operation loops the memory range,
* calling CPUSHL for each 16 byte line to flush and invalidate for
* each of the 4 "ways" of the data cache.
*
* @param pMem points to the starting address of the region to be flushed
* from the data cache. If this address is not 16 byte aligned, this function
* will "back-up" to the nearest 16 byte boundary and start flushing from there.
*
* @param len_bytes specifies the size of the region in bytes.
*/
void FlushDataCacheRegion( void *pMem, unsigned long len_bytes )
{
asm{

    MOVE.L pMem,D0        ;/* fetch start address */
    MOVEA.L D0,A1          ;/* calculate stop address */
    ADDA.L len_bytes,A1   ;
    CLR.L   D1             ;/* init way counter */
    ANDI.L #0xFFFFFFF0,D0 ;/* calculate aligned start address */
FlushDataCacheRegion_wayloop:
    MOVE    D0,A0          ;/* initialize A0 */
    ADDA.L D1,A0          ;/* set way index */

FlushDataCacheRegion_innerloop:

    CPUSHL DC,(A0)        ;/* flush and invalidate the cache line */
    ADD.L   #0x10,A0       ;/* increment to next cache line */
    CMPA.L A0,A1          ;/* done with region? */
    BGT     FlushDataCacheRegion_innerloop;

    ADDQ.L #1,D1          ;/* increment way counter */
    ADDQ.L #1,A1          ;/* update stop address to reflect new way value */
    CMPI.L #4,D1          ;/* check if all cache ways have been flushed */
    BNE     FlushDataCacheRegion_wayloop;
}
}

DMA, FEC and D-cache coherency

DMA, FEC and D-cache coherency

General