iMXRT1176: poor OCRAM read performance

キャンセル
次の結果を表示 
表示  限定  | 次の代わりに検索 
もしかして: 

iMXRT1176: poor OCRAM read performance

ソリューションへジャンプ
2,278件の閲覧回数
udoeb
Contributor II

Hello,

It looks like (non-cached) reading from OCRAM is ~12x times slower than writing.
In our application non-cached read/write performance is important because we have many DMA buffers.

Here are some figures:

CM7 core:
SystemCoreClock = 996000000 Hz, MPU disabled, DCache disabled
                  READ           WRITE
DTCM:       1762252602       963536823    words/s
OCRAM1:       41535756       480395668    words/s

CM4 core:
SystemCoreClock = 392727258 Hz, MPU disabled, DCache disabled
                  READ           WRITE
DTCM:        321593532       348561397    words/s
OCRAM1:       21156189        17457387    words/s


Notes:
1 word = 4 bytes = 32 bits
Test code executes from ITCM.

In either case, based on the respective CPU clock, DTCM performance makes sense to me.

Questions:
1) Why is read access much slower than write access?
2) Why is access to OCRAM from CM4 core much slower than CM7?

Thanks in advance for any comments.
Udo

0 件の賞賛
返信
1 解決策
2,227件の閲覧回数
jingpan
NXP TechSupport
NXP TechSupport

Hi @udoeb ,

There is little work you can do on hardware to improve read speed. Please refer to AN12437 to see if there is some way by software.

 

Regards,

Jing

元の投稿で解決策を見る

0 件の賞賛
返信
3 返答(返信)
2,258件の閲覧回数
jingpan
NXP TechSupport
NXP TechSupport

Hi @udoeb ,

1. M7 core access OCRAM1 & OCRAM2 via AXI bus and controlled by NIC-301 AXI arbiter IP. ARM core visit AXI bus using pipeline mechanism. When writing data, the code is pipelined. It not means the instruction is executed to put data on bus immediately. But when execute read instruction, it must wait till the data coming back, pipeline doesn't have any help. This is why write OCRAM is much faster than read.

2. This is because CM4 and CM7 have different path to access OCRAM1 and OCRAM2. Please see the Figure 2-2 in reference manual. The CM4 requests data from OCRAM through XB (LPSR domain - AHB protocol) and then through NIC (WAKEUPMIX domain AXI protocol) and the clock limitation is BUS / BUS_LPSR.  Both OCRAMs are accessible only via SYSTEM bus (so, in such case no harward possible). If any other bus masters are accessing the same memory (OCRAM1, or OCRAM2) the performance is even more degraded due to arbitration (on XB or NIC).

 

Regards,

Jing

2,241件の閲覧回数
udoeb
Contributor II

Hi @jingpan,

Thanks for your feedback. I understand. More questions:

3) Is there anything we can do to improve M4 access to OCRAM?

4) Is our clock setup optimal? BUS_CLK, BUS_LPSR_CLK and M4_CLK come from SYS_PLL3 (480MHz) while M7_CLK and AXI_CLK come from ARM_PLL. Some values from the generated clock_config.c are shown below.

...
- {id: ARM_PLL_CLK.outFreq, value: 996 MHz}
- {id: AXI_CLK_ROOT.outFreq, value: 996 MHz}
- {id: M7_CLK_ROOT.outFreq, value: 996 MHz}
- {id: BUS_CLK_ROOT.outFreq, value: 240 MHz}
- {id: BUS_LPSR_CLK_ROOT.outFreq, value: 160 MHz}
- {id: M4_CLK_ROOT.outFreq, value: 4320/11 MHz}
...
0 件の賞賛
返信
2,228件の閲覧回数
jingpan
NXP TechSupport
NXP TechSupport

Hi @udoeb ,

There is little work you can do on hardware to improve read speed. Please refer to AN12437 to see if there is some way by software.

 

Regards,

Jing

0 件の賞賛
返信