I'm not sure if I've missed some remark in the reference manual, but there's a thing I don't really get.
When I run the following (custom) script, with r0=8, r1 pointing at the source address and r2 to the destination address,
stf r1, 0x00 # To MSA, NO prefetch, address is incremental
stf r2, 0x04 # To MDA, address is incremental
stf r0, 0x18 # Copy 8 32-bit words
I don't always get 8 copy operations. As it turns out, it depends on the destination address: If it happens to be 4 bytes after a 32-byte boundary (e.g. r2=0xafab0064) the copy operation will perform 7 copies instead of 8. The MS register will indicate no error, but will show that 4 bytes were left in the FIFO (e.g. ms=0x00110404 after the operation). MSA and MDA will be correctly incremented by 32 and 28, respectively, indicating what actually took place. So everything is consistent, but wrong.
This issue is problematic, since the leftover bytes in the FIFO are wiped out in the next copy command, so data is lost.
I've tried other alignment offsets, but haven't found anything suspicious for anything else than a 4-byte alignment offset in MDA.
I tried this with prefetch flag on and off, and it doesn't change anything (as far as the outcome is concerned).
The device inscription goes MCIMX515DJM8C M77X CTGK1038D (CHINA)