Video Playback Performance Evaluation on i.MX6DQ Board

Document created by Phil Chen Employee on Oct 9, 2012Last modified by Jodi Paul on Apr 19, 2013
Version 2Show Document
  • View in full screen mode

In this article, some experiments are done to verify the capability of i.MX6DQ on video playback under different VPU clocks.

 

1. Preparation

Board: i.MX6DQ SD

Bitstream: 1080p sunflower with 40Mbps, it is considered as the toughest H264 clip. The original clip is copied 20 times to generate a new raw video (repeat 20 times of sun-flower clip) and then encapsulate into a mp4 container. This is to remove and minimize the influence of startup workload of gstreamer compared to vpu unit test.

Kernels: Generate different kernel with different VPU clock setting: 270MHz, 298MHz, 329MHz, 352MHz, 382MHz.

test setting: 1080p content decoding and display with 1080p device. (no resize)

 

2. Test command for VPU unit test and Gstreamer

The tiled format video playback is faster than NV12 format, so in below experiment, we choose tiled format during video playback.

Unit test command: (we set the frame rate -a 70, higher than 1080p 60fps HDMI refresh rate)

    /unit_tests/mxc_vpu_test.out -D "-i /media/65a78bbd-1608-4d49-bca8-4e009cafac5e/sunflower_2B_2ref_WP_40Mbps.264 -f 2 -y 1 -a 70"

Gstreamer command: (free run to get the highest playback speed)

    gst-launch filesrc location=/media/65a78bbd-1608-4d49-bca8-4e009cafac5e/sunflower_2B_2ref_WP_40Mbps.mp4 typefind=true ! aiurdemux ! vpudec framedrop=false ! queue max-size-buffers=3 ! mfw_v4lsink sync=false

 

3. Video playback framerate measurement

During test, we enter command "echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor" to make sure the CPU always work at highest frequency, so that it can respond to any interrupt quickly.

For each testing point with different VPU clock, we do 5 rounds of tests. The max and min values are removed, and the remaining 3 data are averaged to get the final playback framerate.

#1#2#3#4#5MinMaxAvg
DecPlaybackDecPlaybackDecPlaybackDecPlaybackDecPlaybackPlaybackPlaybackPlayback
270Munit test57.857.357.8157.0457.7857.357.8756.1557.9155.455.457.356.83
GST53.7654.16354.13654.27353.65953.65954.27354.01967
298Munit test60.9758.3760.9858.5560.9757.860.9458.0760.9858.6557.858.6558.33
GST56.75549.14453.27156.15956.66549.14456.75555.365
329Munit test63.859.5263.9252.6363.858.163.8258.2663.7859.3452.6359.5258.56667
GST57.81555.85756.86258.63756.70355.85758.63757.12667
352Munit test65.7959.6365.7859.6865.7859.6566.1649.2165.9357.6749.2159.6858.98333
GST58.66859.10356.41958.0858.31256.41959.10358.35333
382Munit test64.3456.5867.858.7367.7559.6867.8159.3667.7759.7656.5859.7659.25667
GST59.75358.89358.97258.27359.23858.27359.75359.03433

Note: Dec column means the vpu decoding fps, while Playback column means overall playback fps.

untitled.JPG

Some explanation:

Why does the Gstreamer performance data still improve while unit test is more flat? On Gstreamer, there is a vpu wrapper which is used to make the vpu api more intuitive to be called. So at first, the overall GST playback performance is constrained by vpu (vpu dec 57.8 fps). And finally, as vpu decoding performance goes to higher than 60fps when vpu clock increases, the constraint becomes the display refresh rate 60fps.

The video display overhead of Gstreamer is only about 1 fps, similar to unit test.

 

Based on the test result, we can see that for 352MHz, the overall 1080p video playback on 1080p display can reach ~60fps.

Or if time sharing by two pipelines with two displays, we can do 2 x 1080p @ 30fps video playback.

However, this experiment is valid for 1080p video playback on 1080p display. If for interlaced clip and display with size not same as 1080p, the overall playback performance is limited by some postprocessing like de-interlacing and resize.

2 people found this helpful

Attachments

    Outcomes