<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: High CPU usage when sharing raw camera video across multiple processes using GStreamer shmsink/s in i.MX Processors</title>
    <link>https://community.nxp.com/t5/i-MX-Processors/High-CPU-usage-when-sharing-raw-camera-video-across-multiple/m-p/2325682#M244259</link>
    <description>&lt;P&gt;HI&amp;nbsp;&lt;a href="https://community.nxp.com/t5/user/viewprofilepage/user-id/256223"&gt;@SiddavatamVishnu&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;1. Is shmsink/shmsrc inherently copy‑heavy and memory‑bandwidth limited?&lt;/STRONG&gt;&lt;BR /&gt;Yes. shmsink and shmsrc are fundamentally copy‑heavy because they serialize and deserialize full raw video buffers through shared memory rather than passing DMA‑BUF file descriptors. This means every frame must be fully copied into the shared memory region by the producer, and then each consumer must copy the frame out again into its own pipeline. At 1080p NV12, this results in multiple 3 MB copies per frame per consumer, quickly overwhelming memory bandwidth on embedded platforms like the i.MX8. Because these elements cannot transfer zero‑copy buffer handles, their architecture inherently scales poorly when distributing high resolution raw frames.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;2. Is there a way to share DMABUF across processes without copying?&lt;/STRONG&gt;&lt;BR /&gt;Sharing DMA‑BUF across processes is technically possible, but only through mechanisms that pass file descriptors directly between processes, such as UNIX domain sockets. However, GStreamer elements must explicitly support importing and exporting DMA‑BUF FDs for this to work. Most glue elements, including shmsink, shmsrc, appsink, and appsrc do not support FD passing, so they always fall back to memory‑based copies. To achieve true inter‑process zero‑copy DMA‑BUF sharing, you would need custom code or specialized elements that handle DMABUF FD passing, plus a custom allocator and buffer lifetime management. This is feasible but significantly complex and not supported by standard GStreamer elements.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;3. Would encoding once and sharing compressed stream be the only scalable solution?&lt;/STRONG&gt;&lt;BR /&gt;For multi‑process architectures on i.MX8, encoding once and distributing a compressed stream is indeed the only scalable and practical solution. Hardware video encoders such as v4l2h264enc impose minimal CPU cost and drastically reduce bandwidth requirements, making it efficient to pass the stream to multiple consumer processes. Each consumer can then either pass the encoded data directly or decode it once using the hardware decoder. This approach avoids the high memory footprint and repeated buffers copies inherent in raw-frame fan out designs, making it the architecture most commonly used in commercial embedded products.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;4. Are there NXP/i.MX specific mechanisms (DMA‑BUF export, v4l2 memory sharing, imx plugins) better suited for this?&lt;/STRONG&gt;&lt;BR /&gt;NXP’s i.MX8 platform provides good mechanisms for zero‑copy GPU/VPU processing such as V4L2 DMA‑BUF import/export, the G2D/GC7000 GPU, and memory‑zero‑copy GStreamer plugins, but these mechanisms only work inside a single process. Zero‑copy through DMA‑BUF works well between elements like v4l2src, v4l2convert, and v4l2h264enc, but does not extend across processes unless you manually implement FD passing. The NXP specific GStreamer plugins (e.g., imxv4l2videosrc, imxg2dvideoscale) also assume in‑process zero‑copy pipelines. Therefore, while these tools help within one pipeline, they do not solve the problem of multi‑process raw video distribution.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;5. Is there a recommended design pattern for this use case on embedded systems?&lt;/STRONG&gt;&lt;BR /&gt;Yes, the widely recommended pattern is to avoid sharing raw frames across processes and instead perform a single hardware encode in the producer, then distribute the compressed stream to as many consumers as needed. This minimizes memory bandwidth, avoids unnecessary raw buffer copies, and keeps processing efficiency aligned with the hardware capabilities of the i.MX8. If true raw access is required by more than one subsystem, then the system is typically designed to keep all raw‑processing components inside a single GStreamer pipeline using tee, while other processes interact with the system through control-plane IPC rather than consuming raw video. This pattern achieves optimal performance, scalability, and system isolation without overloading memory or CPU resources.&lt;/P&gt;</description>
    <pubDate>Tue, 03 Mar 2026 17:28:18 GMT</pubDate>
    <dc:creator>Chavira</dc:creator>
    <dc:date>2026-03-03T17:28:18Z</dc:date>
    <item>
      <title>High CPU usage when sharing raw camera video across multiple processes using GStreamer shmsink/shmsr</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/High-CPU-usage-when-sharing-raw-camera-video-across-multiple/m-p/2324940#M244230</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H1&gt;Question Body&lt;/H1&gt;&lt;P&gt;I am working on an embedded Linux project (i.MX8 platform) where I need to share &lt;STRONG&gt;raw camera video across multiple processes&lt;/STRONG&gt; using a producer/consumer architecture.&lt;/P&gt;&lt;H2&gt;Architecture&lt;/H2&gt;&lt;P&gt;I have:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Producer process&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Captures camera using v4l2src&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Sends raw NV12 1080p30 frames to shared memory using shmsink&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Multiple consumer processes&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;One records to file&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;One performs AI inference&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;One (or more) provides RTSP streaming&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;All consume raw frames via shmsrc&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Example producer pipeline:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;gst-launch-1.0 \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;v4l2src &lt;/SPAN&gt;&lt;SPAN class=""&gt;device&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;/dev/video3 &lt;/SPAN&gt;&lt;SPAN class=""&gt;io-mode&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;dmabuf ! \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;video/x-raw&lt;/SPAN&gt;&lt;SPAN class=""&gt;,format&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;NV12&lt;/SPAN&gt;&lt;SPAN class=""&gt;,width&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;1920&lt;/SPAN&gt;&lt;SPAN class=""&gt;,height&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;1080&lt;/SPAN&gt;&lt;SPAN class=""&gt;,framerate&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;30&lt;/SPAN&gt;&lt;SPAN&gt;/1 ! \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;queue ! \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;shmsink &lt;/SPAN&gt;&lt;SPAN class=""&gt;socket-path&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;/tmp/cam.sock &lt;/SPAN&gt;&lt;SPAN class=""&gt;wait-for-connection&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;false&lt;/SPAN&gt; &lt;SPAN class=""&gt;sync&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;false&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;Example consumer (RTSP branch):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;shmsrc &lt;/SPAN&gt;&lt;SPAN class=""&gt;socket-path&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;/tmp/cam.sock &lt;/SPAN&gt;&lt;SPAN class=""&gt;is-live&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;true&lt;/SPAN&gt; &lt;SPAN class=""&gt;do-timestamp&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;true&lt;/SPAN&gt;&lt;SPAN&gt; ! \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;video/x-raw&lt;/SPAN&gt;&lt;SPAN class=""&gt;,format&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;NV12&lt;/SPAN&gt;&lt;SPAN class=""&gt;,width&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;1920&lt;/SPAN&gt;&lt;SPAN class=""&gt;,height&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;1080&lt;/SPAN&gt;&lt;SPAN class=""&gt;,framerate&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;30&lt;/SPAN&gt;&lt;SPAN&gt;/1 ! \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;videoscale ! videorate ! \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;v4l2h264enc ! h264parse ! rtph264pay&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;&lt;HR /&gt;&lt;H2&gt;Problem&lt;/H2&gt;&lt;P&gt;CPU usage is extremely high.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Producer alone consumes ~100% (of one core)&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Producer + 3 RTSP branches consume ~360% of 400% total CPU (quad core system)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;This is unexpected because:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Encoding is hardware accelerated (v4l2h264enc)&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Capture uses io-mode=dmabuf&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;No software decoding is involved&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;However, sharing raw frames across processes appears very expensive.&lt;/P&gt;&lt;HR /&gt;&lt;H2&gt;Observations&lt;/H2&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;Each branch reads raw 1080p frames from shared memory.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Each branch performs scaling and framerate conversion independently.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Shared memory causes one memory copy per branch.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Total memory bandwidth becomes very high:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;1920×1080×1.5 bytes ≈ 3MB per frame&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;3MB × 30 fps × multiple branches&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Using tee inside a single process reduces CPU usage significantly, but:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Static tee does not meet my requirements.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;I need dynamic branch creation and removal.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;I need true inter-process separation (producer/consumer model).&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;appsrc introduces heavy copying and does not share buffers across processes.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;I want to strictly avoid re-encoding and re-decoding between processes.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;UDP/RTP transport is not preferred because it requires encoding.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;HR /&gt;&lt;H2&gt;Question&lt;/H2&gt;&lt;P&gt;What is the best architecture to:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Share &lt;STRONG&gt;raw camera video across multiple processes&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Avoid excessive CPU usage&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Avoid repeated encoding/decoding&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Allow dynamic branch creation&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Maintain process isolation&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Specifically:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;Is shmsink/shmsrc inherently copy-heavy and memory-bandwidth limited?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Is there a way to share DMABUF across processes without copying?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Would encoding once and sharing compressed stream be the only scalable solution?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Are there NXP/i.MX specific mechanisms (DMA-BUF export, v4l2 memory sharing, imx plugins) better suited for this?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Is there a recommended design pattern for this use case on embedded systems?&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;HR /&gt;&lt;H2&gt;Goal&lt;/H2&gt;&lt;P&gt;My goal is to:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Avoid multiple encode/decode cycles&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Avoid unnecessary copies&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Keep CPU usage minimal&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Support dynamic consumer processes&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;HR /&gt;&lt;P&gt;Any architectural guidance or NXP-specific recommendations would be highly appreciated.&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Producer command&lt;/STRONG&gt;&lt;BR /&gt;gst-launch-1.0 v4l2src device=/dev/video3 io-mode=dmabuf ! video/x-raw,format=NV12,width=1920,height=1080,framerate=30/1 ! queue ! shmsink socket-path=/tmp/test_shm.sock wait-for-connection=false sync=false shm-size=200000000&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Consumer Code&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;link&amp;nbsp;&lt;A href="https://drive.google.com/file/d/17BlrFffcaIiQPmrFkBOM_8xFntrOMcym/view?usp=sharing" target="_self"&gt;consumer_python.py&lt;/A&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 02 Mar 2026 10:30:09 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/High-CPU-usage-when-sharing-raw-camera-video-across-multiple/m-p/2324940#M244230</guid>
      <dc:creator>SiddavatamVishnu</dc:creator>
      <dc:date>2026-03-02T10:30:09Z</dc:date>
    </item>
    <item>
      <title>Re: High CPU usage when sharing raw camera video across multiple processes using GStreamer shmsink/s</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/High-CPU-usage-when-sharing-raw-camera-video-across-multiple/m-p/2325682#M244259</link>
      <description>&lt;P&gt;HI&amp;nbsp;&lt;a href="https://community.nxp.com/t5/user/viewprofilepage/user-id/256223"&gt;@SiddavatamVishnu&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;1. Is shmsink/shmsrc inherently copy‑heavy and memory‑bandwidth limited?&lt;/STRONG&gt;&lt;BR /&gt;Yes. shmsink and shmsrc are fundamentally copy‑heavy because they serialize and deserialize full raw video buffers through shared memory rather than passing DMA‑BUF file descriptors. This means every frame must be fully copied into the shared memory region by the producer, and then each consumer must copy the frame out again into its own pipeline. At 1080p NV12, this results in multiple 3 MB copies per frame per consumer, quickly overwhelming memory bandwidth on embedded platforms like the i.MX8. Because these elements cannot transfer zero‑copy buffer handles, their architecture inherently scales poorly when distributing high resolution raw frames.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;2. Is there a way to share DMABUF across processes without copying?&lt;/STRONG&gt;&lt;BR /&gt;Sharing DMA‑BUF across processes is technically possible, but only through mechanisms that pass file descriptors directly between processes, such as UNIX domain sockets. However, GStreamer elements must explicitly support importing and exporting DMA‑BUF FDs for this to work. Most glue elements, including shmsink, shmsrc, appsink, and appsrc do not support FD passing, so they always fall back to memory‑based copies. To achieve true inter‑process zero‑copy DMA‑BUF sharing, you would need custom code or specialized elements that handle DMABUF FD passing, plus a custom allocator and buffer lifetime management. This is feasible but significantly complex and not supported by standard GStreamer elements.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;3. Would encoding once and sharing compressed stream be the only scalable solution?&lt;/STRONG&gt;&lt;BR /&gt;For multi‑process architectures on i.MX8, encoding once and distributing a compressed stream is indeed the only scalable and practical solution. Hardware video encoders such as v4l2h264enc impose minimal CPU cost and drastically reduce bandwidth requirements, making it efficient to pass the stream to multiple consumer processes. Each consumer can then either pass the encoded data directly or decode it once using the hardware decoder. This approach avoids the high memory footprint and repeated buffers copies inherent in raw-frame fan out designs, making it the architecture most commonly used in commercial embedded products.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;4. Are there NXP/i.MX specific mechanisms (DMA‑BUF export, v4l2 memory sharing, imx plugins) better suited for this?&lt;/STRONG&gt;&lt;BR /&gt;NXP’s i.MX8 platform provides good mechanisms for zero‑copy GPU/VPU processing such as V4L2 DMA‑BUF import/export, the G2D/GC7000 GPU, and memory‑zero‑copy GStreamer plugins, but these mechanisms only work inside a single process. Zero‑copy through DMA‑BUF works well between elements like v4l2src, v4l2convert, and v4l2h264enc, but does not extend across processes unless you manually implement FD passing. The NXP specific GStreamer plugins (e.g., imxv4l2videosrc, imxg2dvideoscale) also assume in‑process zero‑copy pipelines. Therefore, while these tools help within one pipeline, they do not solve the problem of multi‑process raw video distribution.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;5. Is there a recommended design pattern for this use case on embedded systems?&lt;/STRONG&gt;&lt;BR /&gt;Yes, the widely recommended pattern is to avoid sharing raw frames across processes and instead perform a single hardware encode in the producer, then distribute the compressed stream to as many consumers as needed. This minimizes memory bandwidth, avoids unnecessary raw buffer copies, and keeps processing efficiency aligned with the hardware capabilities of the i.MX8. If true raw access is required by more than one subsystem, then the system is typically designed to keep all raw‑processing components inside a single GStreamer pipeline using tee, while other processes interact with the system through control-plane IPC rather than consuming raw video. This pattern achieves optimal performance, scalability, and system isolation without overloading memory or CPU resources.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Mar 2026 17:28:18 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/High-CPU-usage-when-sharing-raw-camera-video-across-multiple/m-p/2325682#M244259</guid>
      <dc:creator>Chavira</dc:creator>
      <dc:date>2026-03-03T17:28:18Z</dc:date>
    </item>
  </channel>
</rss>

