i.mx8: not reaching maximum memory bandwidth

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

i.mx8: not reaching maximum memory bandwidth

1,991 次查看
asconcepcion
Contributor I

Hi all,

We've been doing some measurements of the memory bandwidth with the phoronix-test.suite in a i.MX8 QuadMax eval. kit (MEK) board. Most of the tests show results around 5-7 GB/s. However, our understanding is that the theoretical maximum bandwidth is around 25 GB/s (3200 MT/s, 64 bits). Is there a reason to not to reach that limit empirically? Can be some thing done in the internal bus configuration/QoS system?

Thanks in advanced,

Ale
0 项奖励
回复
2 回复数

1,981 次查看
igorpadykov
NXP Employee
NXP Employee

Hi Alejandro

 

theoretical maximum bandwidth can not be achieved in real device due to internal

buses and arbiters limitations. Just as example one can look at NIC301 arbiter description in AN4947

https://www.nxp.com/docs/en/application-note/AN4947.pdf

Results around 5-7 GB/s are similar to internal nxp memory performance data.

In general internal bus configuration/QoS can be changed but, sorry nxp does not support

(provide documentation or examples) it. Default configuration/QoS provide optimal performance

characteristics and any modification will most probably result in performance drop rather than in an improvements.

 

Best regards
igor

0 项奖励
回复

1,963 次查看
asconcepcion
Contributor I

Hi Igor,

Thanks a lot for the reply.

We've been also monitoring the memory bandwidth using the perf counters as shown below:

 

root@imx8qmmek-b0:~# perf stat -a -M i.MX8QM_DDR_MON sleep 1

 Performance counter stats for 'system wide':

          15699996      imx8_ddr0/read-cycles/    # 251199936.0 imx8qm-ddr0-all-r   
             33825      imx8_ddr0/write-cycles/   # 541200.0 imx8qm-ddr0-all-w      
          15694800      imx8_ddr1/read-cycles/    # 251116800.0 imx8qm-ddr1-all-r   
             27415      imx8_ddr1/write-cycles/   # 438640.0 imx8qm-ddr1-all-w

 

From this output, we understand that, in that specific situation, the total memory bandwidth in use (read+write) is about 480 MB/s ( (251199936 + 541200 + 251116800 + 438640) / 2^20 ≈ 480 MB/s), which would make sense as we associate it with the bandwidth used by the framebuffer being displayed.

However, we would like to know, could it be that the bandwidth read from the perf counters were higher than 5-7 GB/s if other subsystems inside the SoC came into play like GPUs or VPU? I.e, could it be the throughput between SoC and external DDR RAM actually higher than the value measured with phoronix? If so, do you have some measurements to know the maximum empirical number?

Thanks again.

Best regard,

Ale

0 项奖励
回复