For all the users with similar questions:
In the end, I managed to find a solution by myself. It seems that the original benchmark analysis and the graph from the first post give incorrect results. I'm not sure how the high numbers were calculated (maybe a counter overflow), but they are not correct.
The metric suggested from AldoG (`imx8mp_bandwidth_usage.lpddr4`) does nothing more than divide the sum of `imx8mp_ddr_read.all` and `imx8mp_ddr_write.all` by a theoretical bandwidth of 16 GByte/s. After some additional testing, I would say that the practical DDR bandwidth of the system is around 3.5 to 4.0 GByte/s.
If you want to measure the contribution of individual components on bandwidth utilization, you can call `perf` with a list of metrics by component, e.g.:
`perf stat -I 1000 -M imx8mp_ddr_read.isp1,imx8mp_ddr_write.isp1,imx8mp_ddr_read.dewarp,imx8mp_ddr_write.dewarp,imx8mp_ddr_read.vpu3,imx8mp_ddr_write.vpu3 -a <SOME_COMMAND>`