Hello,
It looks like (non-cached) reading from OCRAM is ~12x times slower than writing.
In our application non-cached read/write performance is important because we have many DMA buffers.
Here are some figures:
CM7 core:
SystemCoreClock = 996000000 Hz, MPU disabled, DCache disabled
READ WRITE
DTCM: 1762252602 963536823 words/s
OCRAM1: 41535756 480395668 words/s
CM4 core:
SystemCoreClock = 392727258 Hz, MPU disabled, DCache disabled
READ WRITE
DTCM: 321593532 348561397 words/s
OCRAM1: 21156189 17457387 words/s
Notes:
1 word = 4 bytes = 32 bits
Test code executes from ITCM.
In either case, based on the respective CPU clock, DTCM performance makes sense to me.
Questions:
1) Why is read access much slower than write access?
2) Why is access to OCRAM from CM4 core much slower than CM7?
Thanks in advance for any comments.
Udo
解決済! 解決策の投稿を見る。
Hi @udoeb ,
There is little work you can do on hardware to improve read speed. Please refer to AN12437 to see if there is some way by software.
Regards,
Jing
Hi @udoeb ,
1. M7 core access OCRAM1 & OCRAM2 via AXI bus and controlled by NIC-301 AXI arbiter IP. ARM core visit AXI bus using pipeline mechanism. When writing data, the code is pipelined. It not means the instruction is executed to put data on bus immediately. But when execute read instruction, it must wait till the data coming back, pipeline doesn't have any help. This is why write OCRAM is much faster than read.
2. This is because CM4 and CM7 have different path to access OCRAM1 and OCRAM2. Please see the Figure 2-2 in reference manual. The CM4 requests data from OCRAM through XB (LPSR domain - AHB protocol) and then through NIC (WAKEUPMIX domain AXI protocol) and the clock limitation is BUS / BUS_LPSR. Both OCRAMs are accessible only via SYSTEM bus (so, in such case no harward possible). If any other bus masters are accessing the same memory (OCRAM1, or OCRAM2) the performance is even more degraded due to arbitration (on XB or NIC).
Regards,
Jing
Hi @jingpan,
Thanks for your feedback. I understand. More questions:
3) Is there anything we can do to improve M4 access to OCRAM?
4) Is our clock setup optimal? BUS_CLK, BUS_LPSR_CLK and M4_CLK come from SYS_PLL3 (480MHz) while M7_CLK and AXI_CLK come from ARM_PLL. Some values from the generated clock_config.c are shown below.
...
- {id: ARM_PLL_CLK.outFreq, value: 996 MHz}
- {id: AXI_CLK_ROOT.outFreq, value: 996 MHz}
- {id: M7_CLK_ROOT.outFreq, value: 996 MHz}
- {id: BUS_CLK_ROOT.outFreq, value: 240 MHz}
- {id: BUS_LPSR_CLK_ROOT.outFreq, value: 160 MHz}
- {id: M4_CLK_ROOT.outFreq, value: 4320/11 MHz}
...
Hi @udoeb ,
There is little work you can do on hardware to improve read speed. Please refer to AN12437 to see if there is some way by software.
Regards,
Jing