On an iMX7D, using Linux /dev/mem and mmap, I am able to achieve a maximum GPIO toggle rate of 10 Mhz from the (master) A7 core running at 1GHz (i.e. each GPIO transition requires ~ 50 ns). However, when using direct (ASM) access from the Cortex-M4 (running at 238 MHz), the maximum achievable toggle rate drops to ~ 3.3 Mhz - the actual STR instruction writing the bit toggle to the GPIO4 data register requires a full 150 ns to complete (as measured by SYST_CVR reads before and after).
I have enabled Cortex-M4 LMEM caching and programmed the Cortex-M4 MPU to treat the GPIO4 memory space as NON-SHAREABLE, DEVICE MEMORY without any improvement in the toggle rate. Is there any way to improve the GPIO access speed from the Cortex-M4 core , or is this a fundamental limitation of the architecture ?