Hi Erlend,
there is another demo we have that might help you somehow:
Attached is the patch (0001-YUYV-to-NV12-Converter.patch) to generate the YUYV -> NV12 converter using the GPU. I modified the DirectMultiSamplingVideoYUV for simplicity.
The 0001-GPU-YUV-RGB.patch just modifies the original DirectMultiSamplingVideoYUV project.
The solution is developed in Linux using the GPU SDK gtec-demo-framework/DemoApps/GLES3/DirectMultiSamplingVideoYUV at master · NXPmicro/gtec-demo-framew...
The application creates a video in NV12 format from the camera output in YUV422 (YUYV).
The example and solution now works in the next way:
1.- Setting Gstreamer to output the camera video in the default YUV422 YUYV format.
2.- The GPU maps a texture directly to the video buffer. The texture is defined as a RGBA because we need a 4 8-bit component (YUYV).
3.- Create 2 Color attachments as render targets. These buffers/render targets will contain the Y and UV components of the NV12 format.
4.- In the Fragment shader the Y contents are copied to the first render target. The Cb and Cr are downsampled by ½ in the fragment shader and copied to the last render target.
5.- Using the PBO to get a reference to the pixels in each render target. We have a pointer to Y and a different one for the UV components.
This is better shown below:

Regards,
Andre