Hello Yoshitaka-san,
The routine init_dma_tcd_15() is written in form that is convenient for learning -> each individual register bit/field is set by an individual C statement. Since peripheral registers such as DMA are volatile - compiler is unable to optimize (group) bit accesses to a single 16/32 bit load/store.
Therefore I'd recommend you to replace register accesses to the same peripheral register with the single register assignment e.g.:
...
DMA_0.TCD[15].CSR.B.MAJORELINK = 0;
DMA_0.TCD[15].CSR.B.ESG = 0;
DMA_0.TCD[15].CSR.B.BWC = 0;
DMA_0.TCD[15].CSR.B.INTHALF = 0;
....
you can replace with
DMA_0.TCD[15].CSR.R = 0;
Instead of many individual load / bit clear / store instructions you will get single 32bit load/store sequence.
This should significantly improve the speed performance of this specific routine.
Hope it helps.
Stan
Hi, Stan-san,
Hi Yoshitaka-san,
Thanks for the update.
I don't think this routine is a good candidate for benchmarking.
NOTE:
See below AN for MPC5744P optimization tips:
https://www.nxp.com/docs/en/application-note/AN4939.pdf
Hope it helps.
Stan
Hi Stan-san,
Thank you for your reply.
Sorry, I had to describe the detail more.
I want to operate the MPC5744P's peripheral registers access faster.
I had tried the Debug-Ram mode, too. In addition, I set the cache enable.
And the result was 9.625us slower than Debug mode(Flash base)'s 9.465us.
Therefore, I think that flash memory wait-states is no relation.
Now, I am understanding that I have to reduce the peripheral register access instructions only.
Are there any way else ?