i.MX8X: m-jpeg to h264 VPU transcoding speed

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

i.MX8X: m-jpeg to h264 VPU transcoding speed

8,140 次查看
gabrielvalcazar
Contributor IV

Hi all,

I recently got VPU transcoding to work on my i.MX8X-based board after some device tree changes. See the original thread here: https://community.nxp.com/t5/i-MX-Processors/i-MX8X-transcode-m-jpeg-video-to-h264-with-gstreamer-us...

I'm using a custom C0 i.MX8X board with a custom Yocto Linux based on NXP's v5.4.70_2.3.0 BSP. The gstreamer pipeline I'm currently using for the transcoding operation is the following:

 

gst-launch-1.0 -ve filesrc location=OBC_mjpeg.avi ! avidemux ! v4l2jpegdec ! imxvideoconvert_g2d ! v4l2convert ! v4l2h264enc ! h264parse ! mp4mux ! filesink location=OBC_h264.mp4

 

This works, but it takes 3 times as long as the video duration. For example, my source video is 8 seconds longs, and the transcoding process takes about 24 seconds.

Is it possible to achieve transcoding at real time (or practically real time)? If so, what would be the gstreamer pipeline that allows to reach such speeds?

Thanks in advance,

Gabriel

 

0 项奖励
回复
26 回复数

4,927 次查看
gabrielvalcazar
Contributor IV

Hi all,

Although there was some progress since this thread was originally created, we are still unable to get the transcoding on the 8X to reach a speed fast enough to use in a real-time scenario. We're now exploring other alternatives, such as using the 8M Mini instead (using the same OS based on NXP's v5.4.70_2.3.0 Linux BSP).

Given the tests we've performed so far, it seems like the transcoding is much faster on this platform (we're using the sample_1920x1080.mjpeg reference video as a base). However, there are some things we'd like to ask regarding the 8M Mini's VPU usage and the resulting videos:

  • When we want to strictly decode videos (not transcode them), we've tested the following pipelines:
    • $ gst-launch-1.0 filesrc location=sample_1920x1080.mjpeg ! decodebin ! queue ! waylandsink
    • $ gst-launch-1.0 filesrc location=sample_1920x1080.mjpeg ! jpegparse ! jpegdec ! queue ! waylandsink qos=false sync=false
    • $ gst-launch-1.0 filesrc location=sample_1920x1080.mjpeg ! jpegparse ! jpegdec ! queue ! imxvideoconvert_g2d ! videoscale ! queue ! waylandsink qos=false sync=false
  • The decoding works fine, but with a lot of CPU usage: either one of the CPUs is used at nearly 100%, or the load is balanced amongst the 4 CPUs at 20-25%. Is there a better pipeline that reduces the CPU overhead to a minimum, taking the most advantage possible of the VPU?
  • When transcoding, the process is faster than on the 8X, but the resulting video is played at 1fps. There is no frame loss, because you can speed up the video x30 and it plays fine, but: is there a reason behind this very slow framerate in the output video? Here are the pipelines we've tested:
    • $ gst-launch-1.0 filesrc location= sample_1920x1080.mjpeg ! jpegparse ! jpegdec ! queue ! videoconvert ! vpuenc_h264 ! h264parse ! matroskamux ! filesink location=out.mkv
    • $ gst-launch-1.0 filesrc location=sample_1920x1080.mjpeg ! jpegparse ! jpegdec ! queue ! imxvideoconvert_g2d ! vpuenc_h264 ! h264parse ! matroskamux ! filesink location=out1.mkv
  • Note that, even though the input video is the same, the final video size is different depending on the pipeline used. This isn't critical, but we're still curious as to why this happens:
    • root@ccimx8mm-dvk:~/OBC# ls -l
      -rw-r--r-- 1 root root 44289853 Sep 23 09:40 out.mkv
      -rw-r--r-- 1 root root 54156493 Sep 23 09:22 out1.mkv
      -rwxr-x--- 1 root root 70056432 Sep 3 15:55 sample_1920x1080.mjpeg

Attached are the pipeline graphs for each of the pipelines mentioned above.

Let me know if it's ok to discuss the 8M Mini transcoding performance in this thread or if I should create a separate one,

Gabriel

0 项奖励
回复

4,863 次查看
gabrielvalcazar
Contributor IV

Hi @joanxie ,

Any advice moving forward with transcoding on the 8M Mini? Should we migrate this topic to a separate thread?

0 项奖励
回复

4,814 次查看
gabrielvalcazar
Contributor IV

Hi @joanxie ,

Could we get some feedback regarding the i.MX8M Mini transcoding speed, even if done in a separate thread? We have customers that rely on this feature and we would like to be able to tell them confidently if it's feasible or not

0 项奖励
回复

4,783 次查看
gabrielvalcazar
Contributor IV

Hi @joanxie ,

Any comments on the i.MX8M Mini transcoding speed as explained above? Out customers require this information before making important design decisions for their end products

0 项奖励
回复

6,050 次查看
gabrielvalcazar
Contributor IV

Hi again,

Any advice regarding the transcoding speed? This is still an issue that I'd like to solve.

Is the slow speed I'm seeing a hard limitation or is it something that can be improved with a different pipeline and/or driver changes?

Best regards,

Gabriel

0 项奖励
回复

5,991 次查看
joanxie
NXP TechSupport
NXP TechSupport

try to add queue in the commands to improve the performance, like:

gst-launch-1.0 -ve filesrc location=OBC_mjpeg.avi ! avidemux ! v4l2jpegdec ! queue ! imxvideoconvert_g2d ! queue ! v4l2convert ! queue ! v4l2h264enc ! h264parse ! mp4mux ! filesink location=OBC_h264.mp4

 

 

0 项奖励
回复

5,939 次查看
gabrielvalcazar
Contributor IV

Hi @joanxie ,

Thanks for the queue suggestion. It definitely changes the behavior of the transcoding process, but I'm afraid it's still not fast enough.

For one, even though it's faster than before, it still takes up a lot of time (17 seconds versus 28). Another thing I noticed is that the output video is double the length of the original one and is at 30 FPS instead of 60 FPS, even though it seems like it plays at the correct speed when played.

Here's some feedback from the customer that is interested in the real time transcoding feature, with more detailed information:

Regarding the framerate of the video file, I can confirm that it is intended to be 30FPS and approximately 8 seconds long. I have however observed it playing at 60fps for some reason, even though this is not correct. I previously linked another MJPEG file that could be used as an alternative test as it has been encoded by a third party and appears to have no issues with frame rate, but is also still slow to transcode. For ease, here is the link again along with the other issues I had observed at the time:

  1. As you discovered, the video is not processed fast enough to be used in real-time. I have confirmed this with another recorded MJPEG file (https://filesamples.com/samples/video/mjpeg/sample_1920x1080.mjpeg) and with an MJPEG IP camera. I can also confirm that the CPU is not overloaded at all when doing this. The pipeline I have used to transcode the above file is as follows:

gst-launch-1.0 -e filesrc location=sample_1920x1080.mjpeg ! jpegparse ! v4l2jpegdec ! imxvideoconvert_g2d ! video/x-raw,format=YUY2 ! imxvideoconvert_g2d ! v4l2h264enc ! h264parse ! mp4mux ! filesink location=test.mp4

  1. When streaming from an MJPEG IP camera, it appears necessary to manually specify the video framerate in a filter cap. Otherwise the jpeg decoder consistently reports a framerate of 0fps, which is not supported by the H.264 encoder. However this can be worked around easily.

  2. When directly linking the jpeg decoder to the h.264 encoder, the output video is corrupted as the colourspace is interpreted incorrectly by the h.264 encoder. For me, this occurs with the following pipeline using the sample video linked above:

gst-launch-1.0 filesrc location=sample_1920x1080.mjpeg ! jpegparse ! v4l2jpegdec ! v4l2h264enc ! h264parse ! mp4mux ! filesink location=test.mp4

However, this can be worked around by converting to YUY2 format and back as in the original pipeline for this sample.

  1. It does not seem possible to convert the output of the MJPEG decoder to any RGB format (RGBA, RGBx, xRGB, BGRx etc) and then re-encode it as H.264. The  error given in this case is:

g2d_opencl_conversion opencl conversion does not support input format 1 (this number varies from 1-8 depending on the RGB format selected)

The library causing this error appears to be imx-dpu-g2d, but I cannot investigate further as this is closed-source

0 项奖励
回复

5,805 次查看
joanxie
NXP TechSupport
NXP TechSupport

I got reply from R&D team:

"When endsink is filesink, gstreamer need to do 'gst_video_frame_copy', that cost high CPU loading. It's the key bottleneck, I don't have a good solution yet.

Samples: 36K of event 'cpu-clock', Event count (approx.): 9019500000
Children Self Command Shared Object Symbol
+ 39.97% 0.00% queue0:src libc-2.33.so [.] thread_start
+ 39.97% 0.00% queue0:src libpthread-2.33.so [.] start_thread
+ 39.97% 0.00% queue0:src libglib-2.0.so.0.6600.7 [.] 0x0000ffffa8e88264
+ 39.97% 0.00% queue0:src libglib-2.0.so.0.6600.7 [.] 0x0000ffffa8e88d48
+ 39.97% 0.00% queue0:src libgstreamer-1.0.so.0.1800.0 [.] 0x0000ffffa9012594
+ 39.89% 0.00% queue0:src libgstcoreelements.so [.] 0x0000ffffa8a66704
+ 39.89% 0.00% queue0:src libgstreamer-1.0.so.0.1800.0 [.] 0x0000ffffa8fd89c8
+ 39.89% 0.00% queue0:src libgstreamer-1.0.so.0.1800.0 [.] 0x0000ffffa8fd6b40
+ 39.87% 0.00% queue0:src libgstvideo-1.0.so.0.1800.0 [.] 0x0000ffffa87faa04
+ 39.16% 0.00% queue0:src libgstvideo4linux2.so [.] 0x0000ffffa86d7d08
+ 38.96% 0.00% queue0:src libgstvideo4linux2.so [.] 0x0000ffffa86c91c8
+ 38.96% 0.00% queue0:src libgstvideo4linux2.so [.] 0x0000ffffa86c7c74
+ 38.83% 0.00% queue0:src libgstvideo4linux2.so [.] 0x0000ffffa86c5058
+ 38.83% 0.00% queue0:src libgstvideo-1.0.so.0.1800.0 [.] gst_video_frame_copy
+ 38.81% 38.76% queue0:src libc-2.33.so [.] __memcpy_generic

+ 26.68% 0.00% v4l2jpegdec0:sr libc-2.33.so [.] thread_start
+ 26.68% 0.00% v4l2jpegdec0:sr libpthread-2.33.so [.] start_thread

"

 

5,799 次查看
gabrielvalcazar
Contributor IV

Thanks for the feedback @joanxie , it's good to know which element of the pipeline is causing the most overhead.

Even if there's no solution yet, does the R&D team intend to investigate this? Would it be feasible to work around this delay using an alternative pipeline?

Many thanks,

Gabriel

0 项奖励
回复

5,669 次查看
joanxie
NXP TechSupport
NXP TechSupport

the update:

"Please replace libgstvideo4linux2.so  to /usr/lib/gstreamer-1.0/
Then try cmd:
> gst-launch-1.0 -ve filesrc location=OBC_mjpeg.avi ! avidemux ! v4l2video1jpegdec ! queue ! v4l2h264enc ! h264parse ! mp4mux ! filesink location=OBC_h264.mp4

It skip imxvideoconvert_g2d and v4l2convert operation, performance looks well on my side, generally about 55fps, except rare case(30fps) caused by high CPU loading when copy encoded frame to output file."

0 项奖励
回复

5,514 次查看
HectorPalacios
Senior Contributor I

Hi @joanxie 

I tested on the MEK (B0) with images from v5.10. With the default images, the transcoding takes 11s (for an 8s video), but still the CPUs reach 100% load.

I don't see any difference with the library you provided. 11s and CPUs load similar. What did you exactly do with the library?

Regards

0 项奖励
回复

5,463 次查看
joanxie
NXP TechSupport
NXP TechSupport

Please unzip and replace libgstvideo4linux2.so as attachment to /usr/lib/gstreamer-1.0/
Then try cmd:

gst-launch-1.0 -ve filesrc location=OBC_mjpeg.avi ! avidemux ! v4l2video1jpegdec ! queue ! v4l2h264enc ! queue ! h264parse ! avimux ! filesink location=OBC_h264.avi

the test time is close to 8s.

Execution ended after 0:00:09.835277750

-------------------------------------------------------------------------------------------------------

I sent to you the test result again, do you mind test again like this library, if you still take 11s, could you send the result to me? like our test result is " Execution ended after 0:00:09.835277750 "

0 项奖励
回复

5,452 次查看
HectorPalacios
Senior Contributor I

Hello @joanxie 

With the new file, on a MEK (B0) with v5.10 BSP and your pipeline I do get 9.94s.

Unfortunately the customer uses v5.4 BSP. Any chances to get the library with these changes built for v5.4?

Thank you very much for your time.

0 项奖励
回复

5,436 次查看
joanxie
NXP TechSupport
NXP TechSupport

have the customer ever tested this library on 5.4 kernel? failed ? any errors?

 

0 项奖励
回复

5,427 次查看
HectorPalacios
Senior Contributor I

Your library is built with libc 2.33 while Yocto gatesgarth uses 2.32, so the library doesn't load.

Even if I upgrade libc to 2.33 and the library doesn't complain, I can't see the v4l2video1jpegdec plugin so the pipeline doesn't work.

标记 (1)
0 项奖励
回复

5,419 次查看
joanxie
NXP TechSupport
NXP TechSupport

the development team tested this on the 5.4 bsp with C0 board successfully, it seems that the 5.4 bsp can support this library, do the customer mind downloading the 5.4 bsp again and test the library again?

if still failed, send the screenshot or log file to me

 

0 项奖励
回复

5,409 次查看
HectorPalacios
Senior Contributor I

Hi @joanxie 

You're right, with the latest attached libgstvideo4linux2.so on our v5.4 BSP and using your pipeline

gst-launch-1.0 -ve filesrc location=OBC_mjpeg.avi ! avidemux ! v4l2video1jpegdec ! queue ! v4l2h264enc ! queue ! h264parse ! avimux ! filesink location=OBC_h264.avi

the transcoding takes 10.41 s.

It's definitely better but not real time. Would it ever be possible to have this working on real-time or do you think this could not be possible?

Thank you!

0 项奖励
回复

5,382 次查看
joanxie
NXP TechSupport
NXP TechSupport

so this time, this library can work on 5.4 right? with the same library, on 5.10, the transcoding time is 9.94, but on 5.4 the time is 10.41, right?

0 项奖励
回复

5,374 次查看
HectorPalacios
Senior Contributor I

Correct but, to clarify:

  • The 10.41 s were obtained on v5.4 on our board (C0 with 2GB RAM)
  • The 9.94 were obtained on v5.10 on the MEK (B0 with 3GB RAM)
0 项奖励
回复

5,339 次查看
joanxie
NXP TechSupport
NXP TechSupport

the expert team tested C0 board with 5.4.70-2.3.0 bsp, the result is around 9s, pls test this bsp again, if you still get around 10s, this shouldn't library issue

0 项奖励
回复