Hi,
We have a custom board based on SabreSD. We've been able to decode and playback H.264 files in 1080p@60fps in Android 4.3 without skipped frames. Now, we've moved to Android 5.1.1 and are not able to reach the same performance level anymore.
Having investigated the issue, one of the most striking differences that I observed was the drop in the VPU average burst write size obtained with the mmdc profiling tool, while playing an H.264 1080p@60fps file:
Android 4.3 | Android 5.1 | |
MMDC VPU | ||
Total cycles count | 264478000 | 264073304 |
Busy cycles count | 244209294 | 246661781 |
Read accesses count | 7721279 | 5923053 |
Write accesses count | 1598636 | 4961559 |
Read bytes count | 220912824 | 172561608 |
Write bytes count | 98518144 | 81773840 |
Avg. Read burst size | 28 | 29 |
Avg. Write burst size | 61 | 16 |
Read | 420.52 MB/s | 329.14 MB/s |
Write | 187.53 MB/s | 155.97 MB/s |
Total | 608.05 MB/s | 485.11 MB/s |
Utilization | 8% | 6% |
Overall Bus Load | 92% | 93% |
Bytes Access | 34 | 23 |
I understand that the overall problem is essentially the MMDC throughput limit. On average the overall AXI bus load is ~4% higher in 5.1, and I suspect that in 4.3 the limit to decode&play H.264 @ 60 fps is approached very closely, whereas in 5.1 we're over. And it seems that the VPU has suddenly become less efficient. The reason why is what I'm trying to figure out.
Just some additional info:
- In 5.1 when the VPU is made to work without the IPU taking a piece of the DDR bandwidth (when decoded frames are simply dropped and not displayed, and the bus load drops below 90%), the write burst size is still 16, so it is not a "fight against the IPU" throughput issue.
- In 5.1 when the VPU is decoding the frames in this fashion it takes <16ms on average for each frame (like in 4.3). When the decoded frames are being converted and displayed by IPU, as they should be, it takes >16ms on average for each frame to be decoded, hence the inability to do 60 fps.
- I played with VPU QoS settings - didn't change anything
- I tried mxc-vpu-test with all sorts of parameters - did't change anything (actually, the best result was ~50 fps with G2D output in 5.1, whereas in 4.3 outputting with V4L2 easily gave ~60 fps)
- I put the 4.3 VPU firmware (2.3.10) instead of the 5.1 VPU firmware (3.1.1) - didn't change anything.
- In Android 6 I tried on SabreSD the write burst size is even less (~12!) and lots of skipped frames.
Do you have an idea why this write burst size drop has happened? Is there a way I can make the VPU use the memory bandwidth more efficiently, so that the bus load decreases, utilization increases and I can have IPU and VPU both use the DRAM happily and let me watch my H.264 1080p@60 in peace, like before?
Thank you,
-Stan
Solved! Go to Solution.
Can you try to revert below commits in myandroid/external/linux-lib?
commit f6b277b909bf007221d670e99b21e612b43a154c
Author: Hongzhang Yang <Hongzhang.Yang@freescale.com>
Date: Mon Jun 15 09:27:36 2015 +0800
MLK-10871 VPU is blocked
vpu lib v5.4.31
Bug: VPU is blocked in BWB module in some cases
Solution: Disable BWB
Signed-off-by: Hongzhang Yang <Hongzhang.Yang@freescale.com>
Can you try to revert below commits in myandroid/external/linux-lib?
commit f6b277b909bf007221d670e99b21e612b43a154c
Author: Hongzhang Yang <Hongzhang.Yang@freescale.com>
Date: Mon Jun 15 09:27:36 2015 +0800
MLK-10871 VPU is blocked
vpu lib v5.4.31
Bug: VPU is blocked in BWB module in some cases
Solution: Disable BWB
Signed-off-by: Hongzhang Yang <Hongzhang.Yang@freescale.com>
Thanks so much! This was it. Now it's back to normal:
MMDC VPU
***********************
Measure time: 500ms
Total cycles count: 264043312
Busy cycles count: 241802139
Read accesses count: 6880957
Write accesses count: 1547620
Read bytes count: 216889992
Write bytes count: 95349632
Avg. Read burst size: 31
Avg. Write burst size: 61
Read: 413.68 MB/s / Write: 181.86 MB/s Total: 595.55 MB/s
Utilization: 8%
Overall Bus Load: 91%
Bytes Access: 37
One thing to note is after you revert that patch, your system may run into system hang with multiple video decoders instance in a very low possibility.
Hi Stanislav
according to sect.5 Codec Specification
attached i.MX_Android_Extended_Codec_Release_Notes.pdf
guaranteed H.264@1080p fps perfomance is 30.
Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
Hi Igor,
I'm aware of the official specs and of the fact that we're pushing the limits here. It's not a complaint about the degraded performance. It's just that I was hoping that someone could shed some light specifically on why the VPU write burst size over a x64 bus would drop to 16 in the latest versions and what could be done to change it. Just trying to better understand the iMX.6 architecture here.
Thanks,
Stan
Hi Stan
Write burst size is derived as Write burst size=Write bytes count/Write accesses count,
so that means that there are additional bus masters which grabbing ddr bandwidth from vpu,
this is expected with every new android/linux releases as more functionality (usually)
is added. Additionally one can try NIC-301 settings (Chapter 45 i.MX6DQ RM) or just
remove/disable drivers to find cause of reduced vpu bandwidth.
Best regards
igor
Hi Igor,
Thank you for the clarifications. That was my understanding as well and that's why I did a test where the frames decoded by VPU are simply dropped and not displayed. I expected the bus load to drop and to see the VPU getting enough bandwidth to do whatever it is doing. I expected the average write burst to become close to 64 in this case, like in 4.3.
Surprisingly, what I see in this case is that, although the bus load does drop significantly (down to 76%) and the VPU gets its share of the bandwidth (590.57MB/s, close to what it uses up in 4.3), the average write burst size still remains 16!
These are the measurements:
MMDC VPU
***********************
Measure time: 500ms
Total cycles count: 264044584
Busy cycles count: 201964997
Read accesses count: 6833583
Write accesses count: 6065570
Read bytes count: 209241968
Write bytes count: 100384432
Avg. Read burst size: 30
Avg. Write burst size: 16
Read: 399.10 MB/s / Write: 191.47 MB/s Total: 590.57 MB/s
Utilization: 9%
Overall Bus Load: 76%
Bytes Access: 24
MMDC SUM
***********************
Measure time: 500ms
Total cycles count: 264044512
Busy cycles count: 200866383
Read accesses count: 10714476
Write accesses count: 6085360
Read bytes count: 458671512
Write bytes count: 101112032
Avg. Read burst size: 42
Avg. Write burst size: 16
Read: 874.85 MB/s / Write: 192.86 MB/s Total: 1067.70 MB/s
Utilization: 17%
Overall Bus Load: 76%
Bytes Access: 33
So it seems that no matter how I make VPU's life easier it never goes above 16 byte write bursts, in contrast with the 4.3.
Do you have any idea why this is happening?
Best regards,
Stan