Question about practical DDR bandwidth for IMX8MP

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Question about practical DDR bandwidth for IMX8MP

Jump to solution
455 Views
mkeey
Contributor II

Hello everyone,

I'm currently in the process of evaluating the theoretical and maximal DDR bandwidth of the i.MX 8M Plus processor. My setup is the IMX8MP-EVK evaluation board, running a Yocto Linux with kernel 6.1.55.

The board is equipped with 6GB of LPDDR4, the DDR core clock is set to 1GHz. In my understanding, the actual clock of the DDR is doubled (=2GHz), which leads to a theoretical data rate of 4000MT/s. With the 32-bit memory interface, this leads to a theoretical bandwidth of 4000MT/s * 4B = 16 GB/s.

When it comes to the practical bandwidth, I read that the real bandwidth is only half of this value. To get some real numbers, I used perf stat and some integrated metrics (imx8mp_ddr_read.all,imx8mp_ddr_write.all) to obtain the following results while running a stresstest with bandwidth64:

ddr_stresstest.png

I have the following questions:

  1. Are these numbers real and represent the actual bandwidth that can be achieved on such a system?
  2. What could be the reason that read results are much lower than write results? In my understanding, it should be the other way around.
  3. For the total bandwidth, do I have to add "read" and "write" together or is this accelerated in hardware?

Best regards

Markus

0 Kudos
Reply
1 Solution
147 Views
mkeey
Contributor II
For all the users with similar questions:

In the end, I managed to find a solution by myself. It seems that the original benchmark analysis and the graph from the first post give incorrect results. I'm not sure how the high numbers were calculated (maybe a counter overflow), but they are not correct.

The metric suggested from AldoG (`imx8mp_bandwidth_usage.lpddr4`) does nothing more than divide the sum of `imx8mp_ddr_read.all` and `imx8mp_ddr_write.all` by a theoretical bandwidth of 16 GByte/s. After some additional testing, I would say that the practical DDR bandwidth of the system is around 3.5 to 4.0 GByte/s.

If you want to measure the contribution of individual components on bandwidth utilization, you can call `perf` with a list of metrics by component, e.g.:

`perf stat -I 1000 -M imx8mp_ddr_read.isp1,imx8mp_ddr_write.isp1,imx8mp_ddr_read.dewarp,imx8mp_ddr_write.dewarp,imx8mp_ddr_read.vpu3,imx8mp_ddr_write.vpu3 -a <SOME_COMMAND>`

View solution in original post

0 Kudos
Reply
3 Replies
416 Views
AldoG
NXP TechSupport
NXP TechSupport

Hello,

You may use Perf tool for the test.

 

Cycle events is not recommended to use in 8MP and we use axid events instead. But we still use the metric feature to do the test, which will call the axid events internally.

perf list metric


The supported metric feature is imx8mp-lpddr4-bandwidth-usage.

perf stat -a -I 1000 -M imx8mp-lpddr4-bandwidth-usage


Best regards/Saludos,
Aldo.

0 Kudos
Reply
406 Views
mkeey
Contributor II

Hi Aldo,,

thanks for your response. Maybe I didn't make myself clear in the previous post. I already used perf stat with axid events and the integrated metrics to obtain the graph.

In specific, I used the following command:

perf stat -I 1000 -a \
-e imx8_ddr0/axid-read,axi_mask=0xffff/,imx8_ddr0/axid-write,axi_mask=0xffff/ \
-M imx8mp_ddr_read.all,imx8mp_ddr_write.all
Using the metrics imx8mp_ddr_read.all and imx8mp_ddr_write.all gives me the result in kB, but when I use your suggested bandwidth_usage, I only receive percentages. This isn't optimal for me and I would have the values in kB if possible.
 
BTW: In my case, the metric is called imx8mp_bandwidth_usage.lpddr4:
 
perf stat -I 1000 -a -M imx8mp_bandwidth_usage.lpddr4
 
Best regards
Markus
0 Kudos
Reply
148 Views
mkeey
Contributor II
For all the users with similar questions:

In the end, I managed to find a solution by myself. It seems that the original benchmark analysis and the graph from the first post give incorrect results. I'm not sure how the high numbers were calculated (maybe a counter overflow), but they are not correct.

The metric suggested from AldoG (`imx8mp_bandwidth_usage.lpddr4`) does nothing more than divide the sum of `imx8mp_ddr_read.all` and `imx8mp_ddr_write.all` by a theoretical bandwidth of 16 GByte/s. After some additional testing, I would say that the practical DDR bandwidth of the system is around 3.5 to 4.0 GByte/s.

If you want to measure the contribution of individual components on bandwidth utilization, you can call `perf` with a list of metrics by component, e.g.:

`perf stat -I 1000 -M imx8mp_ddr_read.isp1,imx8mp_ddr_write.isp1,imx8mp_ddr_read.dewarp,imx8mp_ddr_write.dewarp,imx8mp_ddr_read.vpu3,imx8mp_ddr_write.vpu3 -a <SOME_COMMAND>`
0 Kudos
Reply