IMX6Q - MMDC: VPU burst write size drop

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

IMX6Q - MMDC: VPU burst write size drop

Jump to solution
2,448 Views
stanislavsoukha
Contributor I

Hi,

We have a custom board based on SabreSD. We've been able to decode and playback H.264 files in 1080p@60fps in Android 4.3 without skipped frames. Now, we've moved to Android 5.1.1 and are not able to reach the same performance level anymore.

Having investigated the issue, one of the most striking differences that I observed was the drop in the VPU average burst write size obtained with the mmdc profiling tool, while playing an H.264 1080p@60fps file:

   

Android 4.3Android 5.1
MMDC VPU
Total cycles count264478000264073304
Busy cycles count244209294246661781
Read accesses count77212795923053
Write accesses count15986364961559
Read bytes count220912824172561608
Write bytes count9851814481773840
Avg. Read burst size2829
Avg. Write burst size6116
Read420.52 MB/s329.14 MB/s
Write187.53 MB/s155.97 MB/s
Total608.05 MB/s485.11 MB/s
Utilization8%6%
Overall Bus Load92%93%
Bytes Access3423

I understand that the overall problem is essentially the MMDC throughput limit. On average the overall AXI bus load is ~4% higher in 5.1, and I suspect that in 4.3 the limit to decode&play H.264 @ 60 fps is approached very closely, whereas in 5.1 we're over. And it seems that the VPU has suddenly become less efficient. The reason why is what I'm trying to figure out.

Just some additional info:

- In 5.1 when the VPU is made to work without the IPU taking a piece of the DDR bandwidth (when decoded frames are simply dropped and not displayed, and the bus load drops below 90%), the write burst size is still 16, so it is not a "fight against the IPU" throughput issue.

- In 5.1 when the VPU is decoding the frames in this fashion it takes <16ms on average for each frame (like in 4.3). When the decoded frames are being converted and displayed by IPU, as they should be, it takes >16ms on average for each frame to be decoded, hence the inability to do 60 fps.

- I played with VPU QoS settings - didn't change anything

- I tried mxc-vpu-test with all sorts of parameters - did't change anything (actually, the best result was ~50 fps with G2D output in 5.1, whereas in 4.3 outputting with V4L2 easily gave ~60 fps)

- I put the 4.3 VPU firmware (2.3.10) instead of the 5.1 VPU firmware (3.1.1) - didn't change anything.

- In Android 6 I tried on SabreSD the write burst size is even less (~12!) and lots of skipped frames.

Do you have an idea why this write burst size drop has happened? Is there a way I can make the VPU use the memory bandwidth more efficiently, so that the bus load decreases, utilization increases and I can have IPU and VPU both use the DRAM happily and let me watch my H.264 1080p@60 in peace, like before?

Thank you,

-Stan

Labels (3)
0 Kudos
Reply
1 Solution
1,787 Views
chenguoyin
NXP Employee
NXP Employee

Can you try to revert below commits in myandroid/external/linux-lib?

commit f6b277b909bf007221d670e99b21e612b43a154c

Author: Hongzhang Yang <Hongzhang.Yang@freescale.com>

Date:   Mon Jun 15 09:27:36 2015 +0800

    MLK-10871 VPU is blocked

    vpu lib v5.4.31

    Bug: VPU is blocked in BWB module in some cases

    Solution: Disable BWB

    Signed-off-by: Hongzhang Yang <Hongzhang.Yang@freescale.com>

View solution in original post

0 Kudos
Reply
7 Replies
1,788 Views
chenguoyin
NXP Employee
NXP Employee

Can you try to revert below commits in myandroid/external/linux-lib?

commit f6b277b909bf007221d670e99b21e612b43a154c

Author: Hongzhang Yang <Hongzhang.Yang@freescale.com>

Date:   Mon Jun 15 09:27:36 2015 +0800

    MLK-10871 VPU is blocked

    vpu lib v5.4.31

    Bug: VPU is blocked in BWB module in some cases

    Solution: Disable BWB

    Signed-off-by: Hongzhang Yang <Hongzhang.Yang@freescale.com>

0 Kudos
Reply
1,787 Views
stanislavsoukha
Contributor I

Thanks so much! This was it. Now it's back to normal:

MMDC VPU

***********************

Measure time: 500ms

Total cycles count: 264043312

Busy cycles count: 241802139

Read accesses count: 6880957

Write accesses count: 1547620

Read bytes count: 216889992

Write bytes count: 95349632

Avg. Read burst size: 31

Avg. Write burst size: 61

Read: 413.68 MB/s /  Write: 181.86 MB/s  Total: 595.55 MB/s

Utilization: 8%

Overall Bus Load: 91%

Bytes Access: 37

0 Kudos
Reply
1,787 Views
chenguoyin
NXP Employee
NXP Employee

One thing to note is after you revert that patch, your system may run into system hang with multiple video decoders instance  in a very low possibility.

0 Kudos
Reply
1,787 Views
igorpadykov
NXP Employee
NXP Employee

Hi Stanislav

according to sect.5 Codec Specification

attached i.MX_Android_Extended_Codec_Release_Notes.pdf

guaranteed H.264@1080p fps perfomance is 30.

Best regards

igor

-----------------------------------------------------------------------------------------------------------------------

Note: If this post answers your question, please click the Correct Answer button. Thank you!

-----------------------------------------------------------------------------------------------------------------------

0 Kudos
Reply
1,787 Views
stanislavsoukha
Contributor I

Hi Igor,

I'm aware of the official specs and of the fact that we're pushing the limits here. It's not a complaint about the degraded performance. It's just that I was hoping that someone could shed some light specifically on why the VPU write burst size over a x64 bus would drop to 16 in the latest versions and what could be done to change it. Just trying to better understand the iMX.6 architecture here.

Thanks,

Stan

0 Kudos
Reply
1,787 Views
igorpadykov
NXP Employee
NXP Employee

Hi Stan

Write burst size is derived as Write burst size=Write bytes count/Write accesses count,

so that means that there are additional bus masters which grabbing ddr bandwidth from vpu,

this is expected with every new android/linux releases as more functionality (usually)

is added. Additionally one can try NIC-301 settings (Chapter 45 i.MX6DQ RM) or just

remove/disable drivers to find cause of reduced vpu bandwidth.

Best regards

igor

0 Kudos
Reply
1,787 Views
stanislavsoukha
Contributor I

Hi Igor,

Thank you for the clarifications. That was my understanding as well and that's why I did a test where the frames decoded by VPU are simply dropped and not displayed. I expected the bus load to drop and to see the VPU getting enough bandwidth to do whatever it is doing. I expected the average write burst to become close to 64 in this case, like in 4.3.

Surprisingly, what I see in this case is that, although the bus load does drop significantly (down to 76%) and the VPU gets its share of the bandwidth (590.57MB/s, close to what it uses up in 4.3), the average write burst size still remains 16!

These are the measurements:

MMDC VPU

***********************

Measure time: 500ms

Total cycles count: 264044584

Busy cycles count: 201964997

Read accesses count: 6833583

Write accesses count: 6065570

Read bytes count: 209241968

Write bytes count: 100384432

Avg. Read burst size: 30

Avg. Write burst size: 16

Read: 399.10 MB/s /  Write: 191.47 MB/s  Total: 590.57 MB/s

Utilization: 9%

Overall Bus Load: 76%

Bytes Access: 24

MMDC SUM

***********************

Measure time: 500ms

Total cycles count: 264044512

Busy cycles count: 200866383

Read accesses count: 10714476

Write accesses count: 6085360

Read bytes count: 458671512

Write bytes count: 101112032

Avg. Read burst size: 42

Avg. Write burst size: 16

Read: 874.85 MB/s /  Write: 192.86 MB/s  Total: 1067.70 MB/s

Utilization: 17%

Overall Bus Load: 76%

Bytes Access: 33

So it seems that no matter how I make VPU's life easier it never goes above 16 byte write bursts, in contrast with the 4.3.

Do you have any idea why this is happening?

Best regards,

Stan

0 Kudos
Reply