Hi all,
I recently got VPU transcoding to work on my i.MX8X-based board after some device tree changes. See the original thread here: https://community.nxp.com/t5/i-MX-Processors/i-MX8X-transcode-m-jpeg-video-to-h264-with-gstreamer-us...
I'm using a custom C0 i.MX8X board with a custom Yocto Linux based on NXP's v5.4.70_2.3.0 BSP. The gstreamer pipeline I'm currently using for the transcoding operation is the following:
gst-launch-1.0 -ve filesrc location=OBC_mjpeg.avi ! avidemux ! v4l2jpegdec ! imxvideoconvert_g2d ! v4l2convert ! v4l2h264enc ! h264parse ! mp4mux ! filesink location=OBC_h264.mp4
This works, but it takes 3 times as long as the video duration. For example, my source video is 8 seconds longs, and the transcoding process takes about 24 seconds.
Is it possible to achieve transcoding at real time (or practically real time)? If so, what would be the gstreamer pipeline that allows to reach such speeds?
Thanks in advance,
Gabriel
Hi all,
Although there was some progress since this thread was originally created, we are still unable to get the transcoding on the 8X to reach a speed fast enough to use in a real-time scenario. We're now exploring other alternatives, such as using the 8M Mini instead (using the same OS based on NXP's v5.4.70_2.3.0 Linux BSP).
Given the tests we've performed so far, it seems like the transcoding is much faster on this platform (we're using the sample_1920x1080.mjpeg reference video as a base). However, there are some things we'd like to ask regarding the 8M Mini's VPU usage and the resulting videos:
Attached are the pipeline graphs for each of the pipelines mentioned above.
Let me know if it's ok to discuss the 8M Mini transcoding performance in this thread or if I should create a separate one,
Gabriel
Hi @joanxie ,
Could we get some feedback regarding the i.MX8M Mini transcoding speed, even if done in a separate thread? We have customers that rely on this feature and we would like to be able to tell them confidently if it's feasible or not
Hi again,
Any advice regarding the transcoding speed? This is still an issue that I'd like to solve.
Is the slow speed I'm seeing a hard limitation or is it something that can be improved with a different pipeline and/or driver changes?
Best regards,
Gabriel
try to add queue in the commands to improve the performance, like:
Hi @joanxie ,
Thanks for the queue suggestion. It definitely changes the behavior of the transcoding process, but I'm afraid it's still not fast enough.
For one, even though it's faster than before, it still takes up a lot of time (17 seconds versus 28). Another thing I noticed is that the output video is double the length of the original one and is at 30 FPS instead of 60 FPS, even though it seems like it plays at the correct speed when played.
Here's some feedback from the customer that is interested in the real time transcoding feature, with more detailed information:
Regarding the framerate of the video file, I can confirm that it is intended to be 30FPS and approximately 8 seconds long. I have however observed it playing at 60fps for some reason, even though this is not correct. I previously linked another MJPEG file that could be used as an alternative test as it has been encoded by a third party and appears to have no issues with frame rate, but is also still slow to transcode. For ease, here is the link again along with the other issues I had observed at the time:
As you discovered, the video is not processed fast enough to be used in real-time. I have confirmed this with another recorded MJPEG file (https://filesamples.com/samples/video/mjpeg/sample_1920x1080.mjpeg) and with an MJPEG IP camera. I can also confirm that the CPU is not overloaded at all when doing this. The pipeline I have used to transcode the above file is as follows:
gst-launch-1.0 -e filesrc location=sample_1920x1080.mjpeg ! jpegparse ! v4l2jpegdec ! imxvideoconvert_g2d ! video/x-raw,format=YUY2 ! imxvideoconvert_g2d ! v4l2h264enc ! h264parse ! mp4mux ! filesink location=test.mp4
When streaming from an MJPEG IP camera, it appears necessary to manually specify the video framerate in a filter cap. Otherwise the jpeg decoder consistently reports a framerate of 0fps, which is not supported by the H.264 encoder. However this can be worked around easily.
When directly linking the jpeg decoder to the h.264 encoder, the output video is corrupted as the colourspace is interpreted incorrectly by the h.264 encoder. For me, this occurs with the following pipeline using the sample video linked above:
gst-launch-1.0 filesrc location=sample_1920x1080.mjpeg ! jpegparse ! v4l2jpegdec ! v4l2h264enc ! h264parse ! mp4mux ! filesink location=test.mp4
However, this can be worked around by converting to YUY2 format and back as in the original pipeline for this sample.
It does not seem possible to convert the output of the MJPEG decoder to any RGB format (RGBA, RGBx, xRGB, BGRx etc) and then re-encode it as H.264. The error given in this case is:
g2d_opencl_conversion opencl conversion does not support input format 1 (this number varies from 1-8 depending on the RGB format selected)
The library causing this error appears to be imx-dpu-g2d, but I cannot investigate further as this is closed-source
I got reply from R&D team:
"When endsink is filesink, gstreamer need to do 'gst_video_frame_copy', that cost high CPU loading. It's the key bottleneck, I don't have a good solution yet.
Samples: 36K of event 'cpu-clock', Event count (approx.): 9019500000
Children Self Command Shared Object Symbol
+ 39.97% 0.00% queue0:src libc-2.33.so [.] thread_start
+ 39.97% 0.00% queue0:src libpthread-2.33.so [.] start_thread
+ 39.97% 0.00% queue0:src libglib-2.0.so.0.6600.7 [.] 0x0000ffffa8e88264
+ 39.97% 0.00% queue0:src libglib-2.0.so.0.6600.7 [.] 0x0000ffffa8e88d48
+ 39.97% 0.00% queue0:src libgstreamer-1.0.so.0.1800.0 [.] 0x0000ffffa9012594
+ 39.89% 0.00% queue0:src libgstcoreelements.so [.] 0x0000ffffa8a66704
+ 39.89% 0.00% queue0:src libgstreamer-1.0.so.0.1800.0 [.] 0x0000ffffa8fd89c8
+ 39.89% 0.00% queue0:src libgstreamer-1.0.so.0.1800.0 [.] 0x0000ffffa8fd6b40
+ 39.87% 0.00% queue0:src libgstvideo-1.0.so.0.1800.0 [.] 0x0000ffffa87faa04
+ 39.16% 0.00% queue0:src libgstvideo4linux2.so [.] 0x0000ffffa86d7d08
+ 38.96% 0.00% queue0:src libgstvideo4linux2.so [.] 0x0000ffffa86c91c8
+ 38.96% 0.00% queue0:src libgstvideo4linux2.so [.] 0x0000ffffa86c7c74
+ 38.83% 0.00% queue0:src libgstvideo4linux2.so [.] 0x0000ffffa86c5058
+ 38.83% 0.00% queue0:src libgstvideo-1.0.so.0.1800.0 [.] gst_video_frame_copy
+ 38.81% 38.76% queue0:src libc-2.33.so [.] __memcpy_generic
+ 26.68% 0.00% v4l2jpegdec0:sr libc-2.33.so [.] thread_start
+ 26.68% 0.00% v4l2jpegdec0:sr libpthread-2.33.so [.] start_thread
"
Thanks for the feedback @joanxie , it's good to know which element of the pipeline is causing the most overhead.
Even if there's no solution yet, does the R&D team intend to investigate this? Would it be feasible to work around this delay using an alternative pipeline?
Many thanks,
Gabriel
the update:
"Please replace libgstvideo4linux2.so to /usr/lib/gstreamer-1.0/
Then try cmd:
> gst-launch-1.0 -ve filesrc location=OBC_mjpeg.avi ! avidemux ! v4l2video1jpegdec ! queue ! v4l2h264enc ! h264parse ! mp4mux ! filesink location=OBC_h264.mp4
It skip imxvideoconvert_g2d and v4l2convert operation, performance looks well on my side, generally about 55fps, except rare case(30fps) caused by high CPU loading when copy encoded frame to output file."
Hi @joanxie
I tested on the MEK (B0) with images from v5.10. With the default images, the transcoding takes 11s (for an 8s video), but still the CPUs reach 100% load.
I don't see any difference with the library you provided. 11s and CPUs load similar. What did you exactly do with the library?
Regards
Please unzip and replace libgstvideo4linux2.so as attachment to /usr/lib/gstreamer-1.0/
Then try cmd:
gst-launch-1.0 -ve filesrc location=OBC_mjpeg.avi ! avidemux ! v4l2video1jpegdec ! queue ! v4l2h264enc ! queue ! h264parse ! avimux ! filesink location=OBC_h264.avi
the test time is close to 8s.
Execution ended after 0:00:09.835277750
-------------------------------------------------------------------------------------------------------
I sent to you the test result again, do you mind test again like this library, if you still take 11s, could you send the result to me? like our test result is " Execution ended after 0:00:09.835277750 "
Hello @joanxie
With the new file, on a MEK (B0) with v5.10 BSP and your pipeline I do get 9.94s.
Unfortunately the customer uses v5.4 BSP. Any chances to get the library with these changes built for v5.4?
Thank you very much for your time.
Your library is built with libc 2.33 while Yocto gatesgarth uses 2.32, so the library doesn't load.
Even if I upgrade libc to 2.33 and the library doesn't complain, I can't see the v4l2video1jpegdec plugin so the pipeline doesn't work.
the development team tested this on the 5.4 bsp with C0 board successfully, it seems that the 5.4 bsp can support this library, do the customer mind downloading the 5.4 bsp again and test the library again?
if still failed, send the screenshot or log file to me
Hi @joanxie
You're right, with the latest attached libgstvideo4linux2.so on our v5.4 BSP and using your pipeline
gst-launch-1.0 -ve filesrc location=OBC_mjpeg.avi ! avidemux ! v4l2video1jpegdec ! queue ! v4l2h264enc ! queue ! h264parse ! avimux ! filesink location=OBC_h264.avi
the transcoding takes 10.41 s.
It's definitely better but not real time. Would it ever be possible to have this working on real-time or do you think this could not be possible?
Thank you!
Correct but, to clarify: