Hi all,
We've been doing some measurements of the memory bandwidth with the phoronix-test.suite in a i.MX8 QuadMax eval. kit (MEK) board. Most of the tests show results around 5-7 GB/s. However, our understanding is that the theoretical maximum bandwidth is around 25 GB/s (3200 MT/s, 64 bits). Is there a reason to not to reach that limit empirically? Can be some thing done in the internal bus configuration/QoS system?
Thanks in advanced,
Hi Alejandro
theoretical maximum bandwidth can not be achieved in real device due to internal
buses and arbiters limitations. Just as example one can look at NIC301 arbiter description in AN4947
https://www.nxp.com/docs/en/application-note/AN4947.pdf
Results around 5-7 GB/s are similar to internal nxp memory performance data.
In general internal bus configuration/QoS can be changed but, sorry nxp does not support
(provide documentation or examples) it. Default configuration/QoS provide optimal performance
characteristics and any modification will most probably result in performance drop rather than in an improvements.
Best regards
igor
Hi Igor,
Thanks a lot for the reply.
We've been also monitoring the memory bandwidth using the perf counters as shown below:
root@imx8qmmek-b0:~# perf stat -a -M i.MX8QM_DDR_MON sleep 1
Performance counter stats for 'system wide':
15699996 imx8_ddr0/read-cycles/ # 251199936.0 imx8qm-ddr0-all-r
33825 imx8_ddr0/write-cycles/ # 541200.0 imx8qm-ddr0-all-w
15694800 imx8_ddr1/read-cycles/ # 251116800.0 imx8qm-ddr1-all-r
27415 imx8_ddr1/write-cycles/ # 438640.0 imx8qm-ddr1-all-w
From this output, we understand that, in that specific situation, the total memory bandwidth in use (read+write) is about 480 MB/s ( (251199936 + 541200 + 251116800 + 438640) / 2^20 ≈ 480 MB/s), which would make sense as we associate it with the bandwidth used by the framebuffer being displayed.
However, we would like to know, could it be that the bandwidth read from the perf counters were higher than 5-7 GB/s if other subsystems inside the SoC came into play like GPUs or VPU? I.e, could it be the throughput between SoC and external DDR RAM actually higher than the value measured with phoronix? If so, do you have some measurements to know the maximum empirical number?
Thanks again.
Best regard,
Ale