Hi there,
We're having performance issues when encoding 1920x1080 @ 30 fps on a custom board using a iMX6Q with LPDDR2 POP memory. The LPDDR2 memory is clocked at 396 MHz. The VPU is running at 270 MHz.
The VPU is not able to keep up at this rate. Using a GStreamer pipeline to create a MP4 from the encoded data the limit seems to be below 24 fps at the moment.
Using the mxc_vpu_test program from the imx-test package I get the following result on our board (encoding 50 1920x1080 YUV frames):
./mxc_vpu_test.out -C config_enc
[INFO] VPU test program built on Mar 15 2016 13:55:16
[INFO] Product Info: i.MX6Q/D/S
[INFO] VPU firmware version: 3.1.1_r46070
[INFO] VPU library version: 5.4.32
[INFO] Format: STD_AVC
[INFO] AVC
[INFO] Input file "/tmp/foo.bin" opened.
[INFO] Output file "test.264" opened.
[INFO] Capture/Encode fps will be 30
[INFO] ringBufferEnable 0, chromaInterleave 0, mapType 0, linear2TiledEnable 0
[INFO] Finished encoding: 50 frames
[INFO] enc fps = 30.06
[INFO] total fps= 21.32
Trying exactly the same on a Nitrogen board the following result is obtained:
./mxc_vpu_test.out -C config_enc
[INFO] VPU test program built on Mar 2 2016 08:53:02
[INFO] Product Info: i.MX6Q/D/S
[INFO] VPU firmware version: 3.1.1_r46070
[INFO] VPU library version: 5.4.32
[INFO] Format: STD_AVC
[INFO] AVC
[INFO] Input file "/tmp/foo.bin" opened.
[INFO] Output file "test.264" opened.
[INFO] Capture/Encode fps will be 30
[INFO] ringBufferEnable 0, chromaInterleave 0, mapType 0, linear2TiledEnable 0
[INFO] Finished encoding: 50 frames
[INFO] enc fps = 39.27
[INFO] total fps= 26.89
The config_enc config file looks like this:
# Write your options here!
# Type of operation encode or decode; encode = 1, decode = 2
operation=1
# read input from file. Mandatory for decode. If not specified for encode
# then default is camera
input=/tmp/foo.bin
# write output to file. For decode, if not specified, then default is LCD
output=test.264
# format; 0 - MPEG4, 1 - H.263, 2 - H.264, 7 - MJPG
format=2
# chromaInterleave, 1 - CbCr is interleaved
chromaInterleave=
# rotation angle (0, 90, 180, 270). Do not specify anything if not needed.
rotation=
# count, number of frames to encode or decode
count=50
# deblocking . 1 - Enable deblock
deblock=
# dering . 1 - Enable dering
dering=
# mirroring (0, 1, 2 , 3)
mirror=
# width, display width for decoding or capture/yuv image width for encoding
width=1920
# height, display height for decoding or capture/yuv image width for encoding
height=1080
# bitrate. default is auto
bitrate=0
# gop size. default is 0
gop=15
# This option specifies the end of option list for one instance
# Each option list must be end with this option. This is mandatory.
end
Both boards are running a Yocto Jethro build with the same VPU firmware and library versions as can be seen above.
The relevant differences between our board and the Nitrogen board are:
1. We run the VPU (and AXI) at 270 MHz vs 264 MHz on Nitrogen
2. We run the memory at 396 MHz vs 528 MHz on Nitrogen
I have attached the clock tree dump from both our board and the nitrogen board for reference.
Looking at various documentation, I've only been able to find that we need to run the VPU at at least 264 MHz to encode 1920x1080@30 fps. I can't find any references to memory frequency in VPU performance discussions.
What are the relevant limitations on VPU performance? What can we do to achieve 30 fps?
Best regards,
Kristoffer Glembo
Original Attachment has been moved to: nitrogen_clocks.txt.zip
Original Attachment has been moved to: custom_clocks.txt.zip
I am also unable to achieve 1080p30 on my i.mx6q board. I get 24-26 fps. My vpu clock is set at 264MHz. Our DRAM is 32-bit 1066MHz. We are running the same video pipeline on another product that is based on the i.mx6SL with 800MHz DRAM and are able to achieve 45 fps for 1080p.
Were you able to find the reason for your performance issues?
Kind Regards,
Don
Hello,
looks like the problem concerns with memory throughput issue.
Please try to run the MMDC Profiling tool (mmdc2 in /unit_tests).
Have a great day,
Yuri
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
Hi there Yuri,
I did and the overall bus load during encoding is 70 % on our board.
Here is the output of "MMDC_SLEEPTIME=5000 ./mmdc2" during encoding:
MDC SUM
MMDC new Profiling results:
***********************
Measure time: 5000ms
Total cycles count: 1980055056
Busy cycles count: 1389737458
Read accesses count: 62189319
Write accesses count: 41704877
Read bytes count: 2617179024
Write bytes count: 1037496326
Avg. Read burst size: 42
Avg. Write burst size: 24
Read: 2760.99 MB/s / Write: 197.89 MB/s Total: 697.07 MB/s
Utilization: 32%
Overall Bus Load: 70%
Bytes Access: 35
Output of "MMDC_SLEEPTIME=5000 ./mmdc2 VPU":
MMDC VPU
MMDC new Profiling results:
***********************
Measure time: 5000ms
Total cycles count: 1980067376
Busy cycles count: 1390836045
Read accesses count: 15742568
Write accesses count: 15929047
Read bytes count: 190257152
Write bytes count: 206835208
Avg. Read burst size: 12
Avg. Write burst size: 12
Read: 200.71 MB/s / Write: 39.45 MB/s Total: 75.74 MB/s
Utilization: 3%
Overall Bus Load: 70%
Bytes Access: 12
I get 78 % Overall Bus Load on the Nitrogen board.
I've also tried setting the QoS parameter for VPU in the NIC-301 to 0xF (real time):
./memtool -32 0xC49100=0xf
./memtool -32 0xC49104=0xf
It did only 0.5 fps and might be a bad idea.
I also tried setting mapType to 1 (frame MB map) and linear2TiledEnable 1 in the mxc_vpu_test which gives better performance. Is it possible to use this when encoding H264 to MP4 container format?
Best regards,
Kristoffer Glembo
I would also like more information about the mapType parameter. It is not well documented.
This is the enum from imx-vpu library:
typedef enum {
LINEAR_FRAME_MAP = 0,
TILED_FRAME_MB_RASTER_MAP = 1,
TILED_FIELD_MB_RASTER_MAP = 2,
TILED_MAP_TYPE_MAX
} GDI_TILED_MAP_TYPE;
This is the information from the imx6q reference manual:
here are 7 map types of frame buffer:
• Type 0 : linear map
• Type 1 : Frame based tiled map, horizontal addressing
• Type 2 : Frame based tiled map, vertical addressing
• Type 3 : Field based tiled map, vertical addressing
• Type 4 : Frame/Field mixed tiled map, vertical addressing
• Type 5 : Tiled MB Raster Frame Map
• Type 6 : Tiled MB Raster Field Map
In the chip, only Type 0, 5 and 6 are supported.
Can I trust the imx-vpu library that mapType should be 1 for tiled MB raster frame map? Where can I found more information about the actual output order depending on mapType? Is the different mapTypes generally supported by H264?
Hello,
perhaps, it would be better to play with codec (quality) parameters
in order to decrease memory load.
Regards,
Yuri.