VPU perfomance on i.MX6 Solo (1080p30 encode + loopback)

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

VPU perfomance on i.MX6 Solo (1080p30 encode + loopback)

4,460 次查看
egmedical
Contributor I

We would like to do 1080p30 video encoding (H.264 if possible) and preview (video loopback) at the same time on i.MX6. Is that possible?

We did some measurements and while we can do either 1080p30 encoding or video loopback, it seems that doing both things at same time seems problematic.

Measurements:

I did measurements of VPU video encoding performance with help of mxc_vpu_test.out unit test. Since it is not possible to encode live stream from camera while running mxc_v4l2_overlay.out or GStreamer pipeline with mfw_v4lsrc (two processes accessing one V4L device), I saved 100 frames from camera into ramdisk (/tmp) via filesink, which I use to measure video encoder performance. The input data are therefore always exactly the same.

In the table below you can see results (enc fps from mxc_vpu_test.out) from several different scenarios:

  • Only mxc_vpu_test.out running, so there is nothing else utilizing the VPU except the encoder test itself.
  • mxc_vpu_test.out running together with GStreamer pipeline consisting of mfw_v4lsrc and mfw_isink.
  • mxc_vpu_test.out running together with mxc_v4l2_overlay.out unit test, which passes data directly from camera to framebuffer.

We use 32bit framebuffer, because using 16 bits means less colors and ugly banding in the resulting image on the display. I have also included results measured with 16bits framebuffer, so you can compare the results.

bpp=16bpp=32
only mxc_vpu_test.out runningonly mxc_vpu_test.out running
codec / gopsizegop = 1gop = 5gop = 10gop = 15codec / gopsizegop = 1gop = 5gop = 10gop = 15
MPEG453,2746,9246,2346,02MPEG453,2746,0245,2645,02
H.26352,7848,5648,0947,95H.26352,7748,5248,0547,91
H.26448,5746,4946,2446,18H.26448,5645,8745,5445,46
MJPG139,31MJPG139,47
mxc_vpu_test.out & gst-launch mfw_v4lsrc ! mfw_isinkmxc_vpu_test.out & gst-launch mfw_v4lsrc ! mfw_isink
codec / gopsizegop = 1gop = 5gop = 10gop = 15codec / gopsizegop = 1gop = 5gop = 10gop = 15
MPEG440,7529,0828,0327,74MPEG430,3121,5320,7120,51
H.2634028,9527,9827,72H.26330,0621,4120,6320,44
H.26435,2226,8326,0425,81H.26426,3919,7419,1118,96
MJPG110,73MJPG82,03
mxc_vpu_test.out & mxc_v4l2_overlay.out
codec / gopsizegop = 1gop = 5gop = 10gop = 15
MPEG452,9739,8738,6938,33
H.26352,6840,5339,3939,06
H.26447,6738,0637,1236,88
MJPG135,51

As you can see, there is huge performance drop when using isink for video loopback. The performance is not sufficient for encoding the video stream in real time, so the resulting stream is missing some frames. Using mxc_v4l2_overlay.out seems to be much better alternative, unfortunately we are not sure if we can combine it with GStreamer.

Is it possible to use mxc_v4l2_overlay.out for video loopback and GStreamer for video saving at same time?

Thanks a lot for any answer.

标签 (4)
0 项奖励
回复
7 回复数

1,948 次查看
igorpadykov
NXP Employee
NXP Employee

Hi Ivo

i.MX6S supports 1080p30 encode + decode but this is max. VPU

bare metal performance capability. Usually this can be obtained in OS-less

environment to avoid OS side effects, that is this is performance VPU

module itself. Probably it can be obtained in your case too, however

software should be optimized for obtaining these max. figures.

In particular VPU frequency should be configured to 350MHz

more obtaining max. characteristics.

Best regards

igor

0 项奖励
回复

1,948 次查看
egmedical
Contributor I

Hi Igor,

well, it seems that VPU itself is powerful enough to do 1080p30 encoding quite well, but the mfw_isink used to display the image at same time is causing major performance drop. I would like to avoid mfw_isink for video loopback and use more direct path, as the mxc_v4l2_overlay.out probably does. But I have no idea if it is possible to configure IPU/VPU to pass the image to the display directly AND save the video stream at the same time (preferably using GStreamer). Is there any way to do that?

I have tried locking VPU clock to 352 MHz (CONFIG_MX6_VPU_352M=y in kernel), but it seems that there is no performance change whatsoever. The results posted in the first post are with CONFIG_MX6_VPU_352M enabled.

Best regards

Ivo

0 项奖励
回复

1,948 次查看
igorpadykov
NXP Employee
NXP Employee

Hi Ivo

one can try latest BSP, it has improved VPU firmware and Gstreamer 1.x support

L3.10.53_1.1.0_iMX6QDLS_Bundle : i.MX 6Quad, i.MX 6Dual, i.MX 6DualLite, i.MX 6Solo

Linux Binary Demo Files and Linux BSP Documentation

Best regards

igor

0 项奖励
回复

1,948 次查看
egmedical
Contributor I

Hi Igor,

since we use 3rd party SoM card, it is quite difficult to just try latest BSP from Freescale until the manufacturer of SoM integrates needed changes. We currently use BSP based on fsl-L3.10.17_1.0.0GA release 3.0.35 kernel (sorry, I am unsure which BSP release it is) with GStreamer 0.10. Is there any way to do that in BSP?

Best regards

Ivo

0 项奖励
回复

1,948 次查看
igorpadykov
NXP Employee
NXP Employee

Hi Ivo

had you tried -L option (loopback) as in attached file ?

Best regards

igor

1,948 次查看
egmedical
Contributor I

Hi Igor,

thank for the tip. I was not aware of the -L option for VPU unit test. I have tried it few minutes ago and the latency is terrible. The processing introduces more than 400 ms delay, which is unacceptable. That is much worse than GStreamer pipeline with isink, which delays the signal by about 140 ms.

Best regards,

Ivo

0 项奖励
回复

1,948 次查看
igorpadykov
NXP Employee
NXP Employee

Hi Ivo

i.MX6 VPU docs do not provide numbers for latencies (delays),

only fps are provided (guaranteed).

Best regards

igor

0 项奖励
回复