iMX6Q VPU performance

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

iMX6Q VPU performance

5,005 Views
kristofferglemb
Contributor III

Hi there,

 

We're having performance issues when encoding 1920x1080 @ 30 fps on a custom board using a iMX6Q with LPDDR2 POP memory. The LPDDR2 memory is clocked at 396 MHz. The VPU is running at 270 MHz.

 

The VPU is not able to keep up at this rate. Using a GStreamer pipeline to create a MP4 from the encoded data the limit seems to be below 24 fps at the moment.

 

Using the mxc_vpu_test program from the imx-test package I get the following result on our board (encoding 50 1920x1080 YUV frames):

 

./mxc_vpu_test.out -C config_enc

[INFO]  VPU test program built on Mar 15 2016 13:55:16

[INFO]  Product Info: i.MX6Q/D/S

[INFO]  VPU firmware version: 3.1.1_r46070

[INFO]  VPU library version: 5.4.32

[INFO]  Format: STD_AVC

[INFO]  AVC

[INFO]  Input file "/tmp/foo.bin" opened.

[INFO]  Output file "test.264" opened.

[INFO]  Capture/Encode fps will be 30

[INFO]  ringBufferEnable 0, chromaInterleave 0, mapType 0, linear2TiledEnable 0

[INFO]  Finished encoding: 50 frames

[INFO]  enc fps = 30.06

[INFO]  total fps= 21.32

 

Trying exactly the same on a Nitrogen board the following result is obtained:

 

./mxc_vpu_test.out -C config_enc

[INFO]  VPU test program built on Mar  2 2016 08:53:02

[INFO]  Product Info: i.MX6Q/D/S

[INFO]  VPU firmware version: 3.1.1_r46070

[INFO]  VPU library version: 5.4.32

[INFO]  Format: STD_AVC

[INFO]  AVC

[INFO]  Input file "/tmp/foo.bin" opened.

[INFO]  Output file "test.264" opened.

[INFO]  Capture/Encode fps will be 30

[INFO]  ringBufferEnable 0, chromaInterleave 0, mapType 0, linear2TiledEnable 0

[INFO]  Finished encoding: 50 frames

[INFO]  enc fps = 39.27

[INFO]  total fps= 26.89

 

The config_enc config file looks like this:

 

# Write your options here!

# Type of operation encode or decode; encode = 1, decode = 2

operation=1

# read input from file. Mandatory for decode. If not specified for encode

# then default is camera

input=/tmp/foo.bin

# write output to file. For decode, if not specified, then default is LCD

output=test.264

# format; 0 - MPEG4, 1 - H.263, 2 - H.264, 7 - MJPG

format=2

# chromaInterleave, 1 - CbCr is interleaved

chromaInterleave=

# rotation angle (0, 90, 180, 270). Do not specify anything if not needed.

rotation=

# count, number of frames to encode or decode

count=50

# deblocking . 1 - Enable deblock

deblock=

# dering . 1 - Enable dering

dering=

# mirroring (0, 1, 2 , 3)

mirror=

# width, display width for decoding or capture/yuv image width for encoding

width=1920

# height, display height for decoding or capture/yuv image width for encoding

height=1080

# bitrate. default is auto

bitrate=0

# gop size. default is 0

gop=15

# This option specifies the end of option list for one instance

# Each option list must be end with this option. This is mandatory.

end

 

Both boards are running a Yocto Jethro build with the same VPU firmware and library versions as can be seen above.

 

The relevant differences between our board and the Nitrogen board are:

 

1. We run the VPU (and AXI) at 270 MHz vs 264 MHz on Nitrogen

2. We run the memory at 396 MHz vs 528 MHz on Nitrogen

 

I have attached the clock tree dump from both our board and the nitrogen board for reference.

 

Looking at various documentation, I've only been able to find that we need to run the VPU at at least 264 MHz to encode 1920x1080@30 fps. I can't find any references to memory frequency in VPU performance discussions.

 

What are the relevant limitations on VPU performance? What can we do to achieve 30 fps?

 

Best regards,

Kristoffer Glembo

Original Attachment has been moved to: nitrogen_clocks.txt.zip

Original Attachment has been moved to: custom_clocks.txt.zip

Tags (1)
5 Replies

1,992 Views
donfreiling
Contributor III

I am also unable to achieve 1080p30 on my i.mx6q board.  I get 24-26 fps. My vpu clock is set at 264MHz. Our DRAM is 32-bit 1066MHz. We are running the same video pipeline on another product that is based on the i.mx6SL with 800MHz DRAM and are able to achieve 45 fps for 1080p.

Were you able to find the reason for your performance issues?

Kind Regards,
Don 

0 Kudos
Reply

1,992 Views
Yuri
NXP Employee
NXP Employee

Hello,

  looks like the problem concerns with memory throughput issue.

Please try to run the MMDC Profiling tool (mmdc2 in /unit_tests).

Have a great day,
Yuri

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos
Reply

1,992 Views
kristofferglemb
Contributor III

Hi there Yuri,

I did and the overall bus load during encoding is 70 % on our board.

Here is the output of "MMDC_SLEEPTIME=5000 ./mmdc2" during encoding:

MDC SUM

MMDC new Profiling results:

***********************

Measure time: 5000ms

Total cycles count: 1980055056

Busy cycles count: 1389737458

Read accesses count: 62189319

Write accesses count: 41704877

Read bytes count: 2617179024

Write bytes count: 1037496326

Avg. Read burst size: 42

Avg. Write burst size: 24

Read: 2760.99 MB/s /  Write: 197.89 MB/s  Total: 697.07 MB/s

Utilization: 32%

Overall Bus Load: 70%

Bytes Access: 35

Output of "MMDC_SLEEPTIME=5000 ./mmdc2 VPU":

MMDC VPU

MMDC new Profiling results:

***********************

Measure time: 5000ms

Total cycles count: 1980067376

Busy cycles count: 1390836045

Read accesses count: 15742568

Write accesses count: 15929047

Read bytes count: 190257152

Write bytes count: 206835208

Avg. Read burst size: 12

Avg. Write burst size: 12

Read: 200.71 MB/s /  Write: 39.45 MB/s  Total: 75.74 MB/s

Utilization: 3%

Overall Bus Load: 70%

Bytes Access: 12

I get 78 % Overall Bus Load on the Nitrogen board.

I've also tried setting the QoS parameter for VPU in the NIC-301 to 0xF (real time):

./memtool -32 0xC49100=0xf

./memtool -32 0xC49104=0xf

It did only 0.5 fps and might be a bad idea.

I also tried setting mapType to 1 (frame MB map) and linear2TiledEnable 1 in the mxc_vpu_test which gives better performance. Is it possible to use this when encoding H264 to MP4 container format?

Best regards,

Kristoffer Glembo

0 Kudos
Reply

1,992 Views
kristofferglemb
Contributor III

I would also like more information about the mapType parameter. It is not well documented.

This is the enum from imx-vpu library:

typedef enum {                                                                                      

        LINEAR_FRAME_MAP = 0,                                                                       

        TILED_FRAME_MB_RASTER_MAP = 1,                                                              

        TILED_FIELD_MB_RASTER_MAP = 2,                                                              

        TILED_MAP_TYPE_MAX                                                                          

} GDI_TILED_MAP_TYPE;

This is the information from the imx6q reference manual:

here are 7 map types of frame buffer:

• Type 0 : linear map

• Type 1 : Frame based tiled map, horizontal addressing

• Type 2 : Frame based tiled map, vertical addressing

• Type 3 : Field based tiled map, vertical addressing

• Type 4 : Frame/Field mixed tiled map, vertical addressing

• Type 5 : Tiled MB Raster Frame Map

• Type 6 : Tiled MB Raster Field Map

In the chip, only Type 0, 5 and 6 are supported.

Can I trust the imx-vpu library that mapType should be 1 for tiled MB raster frame map? Where can I found more information about the actual output order depending on mapType? Is the different mapTypes generally supported by H264?

0 Kudos
Reply

1,992 Views
Yuri
NXP Employee
NXP Employee

Hello,

perhaps, it would be better to play with codec (quality) parameters
in order to decrease memory load.

Regards,

Yuri.

0 Kudos
Reply