Need guidance: YOLOv8 output shape mismatch on NNStreamer (i.MX8MP) + pipeline design question (mini

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Need guidance: YOLOv8 output shape mismatch on NNStreamer (i.MX8MP) + pipeline design question (mini

Jump to solution
294 Views
SiddavatamVishnu
Contributor II

HARDWARE AND SOFTWARE DETAILS

I.MX8MPlus and Linux BSP LF6.12.34_2.1.0

 

Goal

I’m building a command-line pipeline (no GUI) where inference, overlay, and display all run in GStreamer and NNStreamer on an i.MX8MP.

Current experiment (command line)

I tried this pipeline:

 

 

 

 
 

 

GST_DEBUG=GStreamer:4,tensor_filter:6,tensor_transform:6,tensor_decoder:7 \ gst-launch-1.0 --no-position \ v4l2src device=/dev/video4 num-buffers=200 ! \ video/x-raw,width=1920,height=1080,format=NV12,framerate=30/1 ! \ imxvideoconvert_g2d ! \ video/x-raw,width=320,height=320,format=RGBA ! \ videoconvert ! \ video/x-raw,width=320,height=320,format=BGR ! \ tensor_converter ! \ tensor_transform mode=arithmetic option=typecast:int8,add:-128 ! \ tensor_filter framework=tensorflow-lite model=${MODEL} custom=Delegate:External,ExtDelegateLib:${VX_LIB} ! \ tensor_transform mode=arithmetic option=typecast:float32,add:128.0,mul:0.004982381127774715 ! \ tensor_transform mode=transpose option=1:0:2 ! \ tensor_decoder mode=bounding_boxes option1=yolov8 option2=${LABELS} option3=0 option4=1920:1080 option5=320:320 ! \ cairooverlay name=overlay ! \ videoconvert ! \ autovideosink
log file link LINK 

Problem

My YOLOv8 TFLite model outputs (1, 7, 2100), but NNStreamer on i.MX8MP expects 7 × 2100 × 1.

I received this explanation:

The YOLOv8 TFLite model outputs (1,7,2100), but NNStreamer’s YOLOv8 decoder for i.MX8MP expects 7×2100×1. This BSP version only supports transpose on 4D tensors, so the model output needs dequantization, reshape to (1,7,2100,1), then transpose.

 

  • input: int8 [1, 320, 320, 3]

  • output: int8 [1, 7, 2100]

  • scale/zero point

  • output correctly contains 3 classes + 4 bbox values


Current (slow) approach

Right now the application flow is:

  1. GStreamer → BGR → OpenCV

  2. NPU inference

  3. OpenCV post-processing

  4. Back to RTSP pipeline

This causes multiple software videoconvert, and in ideal conditions we reach only ~20 FPS, although the model alone can run 60+ FPS.


Proposed new approach

I want to split the pipeline:

Path A — Inference

Convert NV12 → BGR only here
Run NNStreamer

Path B — Overlay + Display

Keep original NV12/YUY2 frames
Draw bounding boxes directly on NV12 (preferably using hardware)

→ Feed NV12 to encoder / RTSP
→ Avoid software videoconvert completely

I’d first like to prototype this using pure gst-launch, then apply the approach in Python (possibly using OpenGL for NV12 overlay).


What I need help on

  1. How to reshape/transpose (1,7,2100) TFLite output into the format required by NNStreamer’s YOLOv8 decoder on i.MX8MP

    • Any working example using only tensor_filter / transform / decoder?

    • Is there a known workaround for the 3-D output?

  2. Best practice for overlay on NV12/YUY2

    • Any NNStreamer-friendly way to draw boxes directly on NV12?

    • Recommended elements (cairooverlay on NV12? OpenGL? v4l2convert? imxvideoconvert_g2d overlays?)

  3. General advice: Is the split-pipeline (inference on BGR, overlay on NV12) a reasonable architectural direction on i.MX8MP?


Labels (1)
0 Kudos
Reply
1 Solution
189 Views
SiddavatamVishnu
Contributor II

I have a Update, i have used the mode dimension change (dimchg ) in the tensor_transform element to match the expections of the tensor_decoder.  
 
....  ! tensor_filter framework=tensorflow-lite model=../../vaishnavi/model_calibrated_int8_og_320.tflite custom=Delegate:External,ExtDelegateLib:libvx_delegate.so ! tensor_transform mode=arithmetic option=typecast:float32,add:128.0,mul:0.004982381 ! tensor_transform mode=dimchg option=0:1 ! tensor_decoder mode=bounding_boxes option1=yolov8 option2=labels_over.txt option3=0 option4=1920:1080 option5=320:320 ! ......



Thanks for the replies.

Thanks and Regards
Siddavatam Vishnu

View solution in original post

0 Kudos
Reply
4 Replies
267 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hello,

The issue stems from a mismatch between your YOLOv8 TFLite model output shape (1,7,2100) and the format expected by NNStreamer's YOLOv8 decoder on i.MX8MP (7×2100×1). This is happening because:

1. Your current BSP version (LF6.12.34_2.1.0) supports transpose operations only on 4D tensors
2. The model output needs reshaping and transposing to match the decoder's expectations

## Recommended Solution

For the tensor transformation, you need to apply:
1. Dequantization (if using quantized model)
2. Reshape the output from (1,7,2100) to (1,7,2100,1)
3. Transpose the tensor to the required format (7×2100×1)

## Pipeline Optimization

Your proposed architecture (splitting the pipeline) is a sound approach:
- Convert NV12 → BGR only for inference
- Keep original NV12/YUY2 frames for display/encoding
- Overlay detection results directly on NV12 using hardware acceleration

This will eliminate software videoconvert operations and achieve better performance.

## Recommended Elements for NV12 Pipeline

For drawing bounding boxes directly on NV12:
- Use `imxvideoconvert_g2d` with overlay capability
- Alternative: `cairooverlay` can work with NV12 but may require format adaptation

Example pipeline structure:
```
v4l2src → NV12 → tee → branch1: convert to BGR → inference → detection results
branch2: original NV12 → imxvideoconvert_g2d (with overlay) → encoder/display
```

This approach should significantly improve performance beyond your current ~20 FPS limit, leveraging the NPU's 60+ FPS capability by eliminating unnecessary format conversions.

 

Regards

0 Kudos
Reply
264 Views
SiddavatamVishnu
Contributor II
Thanks for the Quick response, I had tried to use the reshape in the tensor_transform before, but it returned an error.
root@imx8mpevk:~# export MODEL=/root/rtsp/testing1/saved_model_triding_320.tflite
root@imx8mpevk:~# export LABELS=/root/rtsp/testing1/labels.txt
root@imx8mpevk:~# gst-launch-1.0 --no-position v4l2src device=/dev/video4 ! video/x-raw,format=NV12,width=1920,height=1080,framerate=30/1 ! imxvideoconvert_g2d ! video/x-raw,width=320,height=320,format=RGBA ! videoconvert ! video/x-raw,width=320,height=320,format=BGR ! tensor_converter ! tensor_transform mode=arithmetic option=typecast:int8,add:-128 ! tensor_filter framework=tensorflow-lite model=${MODEL} ! tensor_transform mode=arithmetic option=typecast:float32,add:128.0,mul:0.004982381 ! tensor_transform mode=reshape option=1:7:2100:1 ! tensor_transform mode=transpose option=1:0:2:3 ! tensor_decoder mode=bounding_boxes option1=yolov8 option2=${LABELS} option3=0 option4=1920:1080 option5=320:320 ! cairooverlay name=overlay ! videoconvert ! autovideosink
** Message: 14:56:17.877: accl = cpu

** (gst-launch-1.0:1335): CRITICAL **: 14:56:17.931: bb_getOutCaps: assertion 'config->info.info[0].type == _NNS_FLOAT32' failed

** (gst-launch-1.0:1335): CRITICAL **: 14:56:17.931: bb_getOutCaps: assertion 'config->info.info[0].type == _NNS_FLOAT32' failed

** (gst-launch-1.0:1335): CRITICAL **: 14:56:17.931: bb_getOutCaps: assertion 'config->info.info[0].type == _NNS_FLOAT32' failed

** (gst-launch-1.0:1335): CRITICAL **: 14:56:17.935: bb_getOutCaps: assertion 'config->info.info[0].type == _NNS_FLOAT32' failed

** (gst-launch-1.0:1335): CRITICAL **: 14:56:17.938: bb_getOutCaps: assertion 'config->info.info[0].type == _NNS_FLOAT32' failed

** (gst-launch-1.0:1335): CRITICAL **: 14:56:17.939: bb_getOutCaps: assertion 'config->info.info[0].type == _NNS_FLOAT32' failed
WARNING: erroneous pipeline: could not set property "mode" in element "tensor_transform" to "reshape"
root@imx8mpevk:~#

Could you please suggest changes to the pipeline to match the expected formats and also for overlaying on the nv12.

Or do i need to downgrade or upgrade my linux bsp version to match them.

If possible please test the pipeline and provide back.

Thanks and Regards
S Vishnu
0 Kudos
Reply
212 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hi,

Yes, you need downgrade your linux BSP. I guess with that will work because We have not yet tested pipeline in Yolov8

Regards

190 Views
SiddavatamVishnu
Contributor II

I have a Update, i have used the mode dimension change (dimchg ) in the tensor_transform element to match the expections of the tensor_decoder.  
 
....  ! tensor_filter framework=tensorflow-lite model=../../vaishnavi/model_calibrated_int8_og_320.tflite custom=Delegate:External,ExtDelegateLib:libvx_delegate.so ! tensor_transform mode=arithmetic option=typecast:float32,add:128.0,mul:0.004982381 ! tensor_transform mode=dimchg option=0:1 ! tensor_decoder mode=bounding_boxes option1=yolov8 option2=labels_over.txt option3=0 option4=1920:1080 option5=320:320 ! ......



Thanks for the replies.

Thanks and Regards
Siddavatam Vishnu

0 Kudos
Reply
%3CLINGO-SUB%20id%3D%22lingo-sub-2257202%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3ENeed%20guidance%3A%20YOLOv8%20output%20shape%20mismatch%20on%20NNStreamer%20(i.MX8MP)%20%2B%20pipeline%20design%20question%20(mini%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2257202%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3E%3CFONT%20face%3D%22arial%2Chelvetica%2Csans-serif%22%3E%3CSTRONG%3EHARDWARE%20AND%20SOFTWARE%20DETAILS%3C%2FSTRONG%3E%3C%2FFONT%3E%3C%2FP%3E%3CP%3EI.MX8MPlus%20and%20Linux%20BSP%26nbsp%3BLF6.12.34_2.1.0%3C%2FP%3E%3CBR%20%2F%3E%3CH2%20id%3D%22toc-hId--495505486%22%20id%3D%22toc-hId--495476470%22%20id%3D%22toc-hId--495476470%22%20id%3D%22toc-hId--495476470%22%3EGoal%3C%2FH2%3E%3CP%3EI%E2%80%99m%20building%20a%20command-line%20pipeline%20(no%20GUI)%20where%20%3CSTRONG%3Einference%2C%20overlay%2C%20and%20display%3C%2FSTRONG%3E%20all%20run%20in%20GStreamer%20and%20NNStreamer%20on%20an%20%3CSTRONG%3Ei.MX8MP%3C%2FSTRONG%3E.%3C%2FP%3E%3CH2%20id%3D%22toc-hId-1992007347%22%20id%3D%22toc-hId-1992036363%22%20id%3D%22toc-hId-1992036363%22%20id%3D%22toc-hId-1992036363%22%3ECurrent%20experiment%20(command%20line)%3C%2FH2%3E%3CP%3EI%20tried%20this%20pipeline%3A%3C%2FP%3E%3CBR%20%2F%3E%3CDIV%20class%3D%22%22%3E%26nbsp%3B%3C%2FDIV%3E%3CBR%20%2F%3E%3CPRE%3E%26nbsp%3B%3C%2FPRE%3E%3CDIV%20class%3D%22%22%3E%3CDIV%20class%3D%22%22%3E%26nbsp%3B%3C%2FDIV%3E%3C%2FDIV%3E%3CBR%20%2F%3E%3CDIV%20class%3D%22%22%3E%3CDIV%20class%3D%22%22%3E%3CSPAN%3EGST_DEBUG%3DGStreamer%3A4%2Ctensor_filter%3A6%2Ctensor_transform%3A6%2Ctensor_decoder%3A7%20%5C%20gst-launch-1.0%20--no-position%20%5C%20v4l2src%20device%3D%2Fdev%2Fvideo4%20num-buffers%3D200%20!%20%5C%20video%2Fx-raw%2Cwidth%3D1920%2Cheight%3D1080%2Cformat%3DNV12%2Cframerate%3D30%2F1%20!%20%5C%20imxvideoconvert_g2d%20!%20%5C%20video%2Fx-raw%2Cwidth%3D320%2Cheight%3D320%2Cformat%3DRGBA%20!%20%5C%20videoconvert%20!%20%5C%20video%2Fx-raw%2Cwidth%3D320%2Cheight%3D320%2Cformat%3DBGR%20!%20%5C%20tensor_converter%20!%20%5C%20tensor_transform%20mode%3Darithmetic%20option%3Dtypecast%3Aint8%2Cadd%3A-128%20!%20%5C%20tensor_filter%20framework%3Dtensorflow-lite%20model%3D%3CSPAN%20class%3D%22%22%3E%24%7BMODEL%7D%3C%2FSPAN%3E%20custom%3DDelegate%3AExternal%2CExtDelegateLib%3A%3CSPAN%20class%3D%22%22%3E%24%7BVX_LIB%7D%3C%2FSPAN%3E%20!%20%5C%20tensor_transform%20mode%3Darithmetic%20option%3Dtypecast%3Afloat32%2Cadd%3A128.0%2Cmul%3A0.004982381127774715%20!%20%5C%20tensor_transform%20mode%3Dtranspose%20option%3D1%3A0%3A2%20!%20%5C%20tensor_decoder%20mode%3Dbounding_boxes%20option1%3Dyolov8%20option2%3D%3CSPAN%20class%3D%22%22%3E%24%7BLABELS%7D%3C%2FSPAN%3E%20option3%3D0%20option4%3D1920%3A1080%20option5%3D320%3A320%20!%20%5C%20cairooverlay%20name%3Doverlay%20!%20%5C%20videoconvert%20!%20%5C%20autovideosink%20%3C%2FSPAN%3E%3C%2FDIV%3E%3C%2FDIV%3E%3CPRE%3Elog%20file%20link%20%3CA%20title%3D%22LINK%22%20href%3D%22https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F12BDRLqtU40uK-Gbz039lmZXWCKbktnZPyXj5ZRzyr3w%2Fedit%3Fusp%3Dsharing%22%20target%3D%22_self%22%20rel%3D%22nofollow%20noopener%20noreferrer%22%3ELINK%3C%2FA%3E%26nbsp%3B%3C%2FPRE%3E%3CH2%20id%3D%22toc-hId-184552884%22%20id%3D%22toc-hId-184581900%22%20id%3D%22toc-hId-184581900%22%20id%3D%22toc-hId-184581900%22%3E%3CLI-EMOJI%20id%3D%22lia_exclamation-mark%22%20title%3D%22%3Aexclamation_mark%3A%22%3E%3C%2FLI-EMOJI%3E%20Problem%3C%2FH2%3E%3CP%3EMy%20YOLOv8%20TFLite%20model%20outputs%20(1%2C%207%2C%202100)%2C%20but%20NNStreamer%20on%20i.MX8MP%20expects%20%3CSTRONG%3E7%20%C3%97%202100%20%C3%97%201%3C%2FSTRONG%3E.%3C%2FP%3E%3CP%3EI%20received%20this%20explanation%3A%3C%2FP%3E%3CBLOCKQUOTE%3E%3CP%3EThe%20YOLOv8%20TFLite%20model%20outputs%20(1%2C7%2C2100)%2C%20but%20NNStreamer%E2%80%99s%20YOLOv8%20decoder%20for%20i.MX8MP%20expects%207%C3%972100%C3%971.%20This%20BSP%20version%20only%20supports%20transpose%20on%204D%20tensors%2C%20so%20the%20model%20output%20needs%20dequantization%2C%20reshape%20to%20(1%2C7%2C2100%2C1)%2C%20then%20transpose.%3C%2FP%3E%3C%2FBLOCKQUOTE%3E%3CBR%20%2F%3E%3CUL%3E%3CLI%3E%3CP%3Einput%3A%20int8%20%5B1%2C%20320%2C%20320%2C%203%5D%3C%2FP%3E%3C%2FLI%3E%3CLI%3E%3CP%3Eoutput%3A%20int8%20%5B1%2C%207%2C%202100%5D%3C%2FP%3E%3C%2FLI%3E%3CLI%3E%3CP%3Escale%2Fzero%20point%3C%2FP%3E%3C%2FLI%3E%3CLI%3E%3CP%3Eoutput%20correctly%20contains%203%20classes%20%2B%204%20bbox%20values%3C%2FP%3E%3C%2FLI%3E%3C%2FUL%3E%3CHR%20%2F%3E%3CH2%20id%3D%22toc-hId--1622901579%22%20id%3D%22toc-hId--1622872563%22%20id%3D%22toc-hId--1622872563%22%20id%3D%22toc-hId--1622872563%22%3E%3CLI-EMOJI%20id%3D%22lia_gear%22%20title%3D%22%3Agear%3A%22%3E%3C%2FLI-EMOJI%3E%20Current%20(slow)%20approach%3C%2FH2%3E%3CP%3ERight%20now%20the%20application%20flow%20is%3A%3C%2FP%3E%3COL%3E%3CLI%3E%3CP%3EGStreamer%20%E2%86%92%20BGR%20%E2%86%92%20OpenCV%3C%2FP%3E%3C%2FLI%3E%3CLI%3E%3CP%3ENPU%20inference%3C%2FP%3E%3C%2FLI%3E%3CLI%3E%3CP%3EOpenCV%20post-processing%3C%2FP%3E%3C%2FLI%3E%3CLI%3E%3CP%3EBack%20to%20RTSP%20pipeline%3C%2FP%3E%3C%2FLI%3E%3C%2FOL%3E%3CP%3EThis%20causes%20%3CSTRONG%3Emultiple%20software%20videoconvert%3C%2FSTRONG%3E%2C%20and%20in%20ideal%20conditions%20we%20reach%20only%20%3CSTRONG%3E~20%20FPS%3C%2FSTRONG%3E%2C%20although%20the%20model%20alone%20can%20run%2060%2B%20FPS.%3C%2FP%3E%3CHR%20%2F%3E%3CH2%20id%3D%22toc-hId-864611254%22%20id%3D%22toc-hId-864640270%22%20id%3D%22toc-hId-864640270%22%20id%3D%22toc-hId-864640270%22%3E%3CLI-EMOJI%20id%3D%22lia_counterclockwise-arrows-button%22%20title%3D%22%3Acounterclockwise_arrows_button%3A%22%3E%3C%2FLI-EMOJI%3E%20Proposed%20new%20approach%3C%2FH2%3E%3CP%3EI%20want%20to%20split%20the%20pipeline%3A%3C%2FP%3E%3CH3%20id%3D%22toc-hId-1555172728%22%20id%3D%22toc-hId-1555201744%22%20id%3D%22toc-hId-1555201744%22%20id%3D%22toc-hId-1555201744%22%3EPath%20A%20%E2%80%94%20Inference%3C%2FH3%3E%3CP%3EConvert%20NV12%20%E2%86%92%20BGR%20only%20here%3CBR%20%2F%3ERun%20NNStreamer%3C%2FP%3E%3CH3%20id%3D%22toc-hId--252281735%22%20id%3D%22toc-hId--252252719%22%20id%3D%22toc-hId--252252719%22%20id%3D%22toc-hId--252252719%22%3EPath%20B%20%E2%80%94%20Overlay%20%2B%20Display%3C%2FH3%3E%3CP%3EKeep%20original%20NV12%2FYUY2%20frames%3CBR%20%2F%3EDraw%20bounding%20boxes%20directly%20on%20NV12%20(preferably%20using%20hardware)%3C%2FP%3E%3CP%3E%E2%86%92%20Feed%20NV12%20to%20encoder%20%2F%20RTSP%3CBR%20%2F%3E%E2%86%92%20Avoid%20software%20videoconvert%20completely%3C%2FP%3E%3CP%3EI%E2%80%99d%20first%20like%20to%20prototype%20this%20using%20pure%20gst-launch%2C%20then%20apply%20the%20approach%20in%20Python%20(possibly%20using%20OpenGL%20for%20NV12%20overlay).%3C%2FP%3E%3CHR%20%2F%3E%3CH2%20id%3D%22toc-hId--262784839%22%20id%3D%22toc-hId--262755823%22%20id%3D%22toc-hId--262755823%22%20id%3D%22toc-hId--262755823%22%3E%3CLI-EMOJI%20id%3D%22lia_folded-hands%22%20title%3D%22%3Afolded_hands%3A%22%3E%3C%2FLI-EMOJI%3E%20What%20I%20need%20help%20on%3C%2FH2%3E%3COL%3E%3CLI%3E%3CP%3E%3CSTRONG%3EHow%20to%20reshape%2Ftranspose%20(1%2C7%2C2100)%20TFLite%20output%3C%2FSTRONG%3E%20into%20the%20format%20required%20by%20NNStreamer%E2%80%99s%20YOLOv8%20decoder%20on%20i.MX8MP%3C%2FP%3E%3CUL%3E%3CLI%3E%3CP%3EAny%20working%20example%20using%20only%20tensor_filter%20%2F%20transform%20%2F%20decoder%3F%3C%2FP%3E%3C%2FLI%3E%3CLI%3E%3CP%3EIs%20there%20a%20known%20workaround%20for%20the%203-D%20output%3F%3C%2FP%3E%3C%2FLI%3E%3C%2FUL%3E%3C%2FLI%3E%3CLI%3E%3CP%3E%3CSTRONG%3EBest%20practice%20for%20overlay%20on%20NV12%2FYUY2%3C%2FSTRONG%3E%3C%2FP%3E%3CUL%3E%3CLI%3E%3CP%3EAny%20NNStreamer-friendly%20way%20to%20draw%20boxes%20directly%20on%20NV12%3F%3C%2FP%3E%3C%2FLI%3E%3CLI%3E%3CP%3ERecommended%20elements%20(cairooverlay%20on%20NV12%3F%20OpenGL%3F%20v4l2convert%3F%20imxvideoconvert_g2d%20overlays%3F)%3C%2FP%3E%3C%2FLI%3E%3C%2FUL%3E%3C%2FLI%3E%3CLI%3E%3CP%3E%3CSTRONG%3EGeneral%20advice%3A%3C%2FSTRONG%3E%20Is%20the%20split-pipeline%20(inference%20on%20BGR%2C%20overlay%20on%20NV12)%20a%20reasonable%20architectural%20direction%20on%20i.MX8MP%3F%3C%2FP%3E%3C%2FLI%3E%3C%2FOL%3E%3CHR%20%2F%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-2257202%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CLINGO-LABEL%3Ei.MX%208M%20%7C%20i.MX%208M%20Mini%20%7C%20i.MX%208M%20Nano%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2257251%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%20translate%3D%22no%22%3ERe%3A%20Need%20guidance%3A%20YOLOv8%20output%20shape%20mismatch%20on%20NNStreamer%20(i.MX8MP)%20%2B%20pipeline%20design%20question%20(%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2257251%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3EThanks%20for%20the%20Quick%20response%2C%20I%20had%20tried%20to%20use%20the%20reshape%20in%20the%20tensor_transform%20before%2C%20but%20it%20returned%20an%20error.%3CBR%20%2F%3Eroot%40imx8mpevk%3A~%23%20export%20MODEL%3D%2Froot%2Frtsp%2Ftesting1%2Fsaved_model_triding_320.tflite%3CBR%20%2F%3Eroot%40imx8mpevk%3A~%23%20export%20LABELS%3D%2Froot%2Frtsp%2Ftesting1%2Flabels.txt%3CBR%20%2F%3Eroot%40imx8mpevk%3A~%23%20gst-launch-1.0%20--no-position%20v4l2src%20device%3D%2Fdev%2Fvideo4%20!%20video%2Fx-raw%2Cformat%3DNV12%2Cwidth%3D1920%2Cheight%3D1080%2Cframerate%3D30%2F1%20!%20imxvideoconvert_g2d%20!%20video%2Fx-raw%2Cwidth%3D320%2Cheight%3D320%2Cformat%3DRGBA%20!%20videoconvert%20!%20video%2Fx-raw%2Cwidth%3D320%2Cheight%3D320%2Cformat%3DBGR%20!%20tensor_converter%20!%20tensor_transform%20mode%3Darithmetic%20option%3Dtypecast%3Aint8%2Cadd%3A-128%20!%20tensor_filter%20framework%3Dtensorflow-lite%20model%3D%24%7BMODEL%7D%20!%20tensor_transform%20mode%3Darithmetic%20option%3Dtypecast%3Afloat32%2Cadd%3A128.0%2Cmul%3A0.004982381%20!%20tensor_transform%20mode%3Dreshape%20option%3D1%3A7%3A2100%3A1%20!%20tensor_transform%20mode%3Dtranspose%20option%3D1%3A0%3A2%3A3%20!%20tensor_decoder%20mode%3Dbounding_boxes%20option1%3Dyolov8%20option2%3D%24%7BLABELS%7D%20option3%3D0%20option4%3D1920%3A1080%20option5%3D320%3A320%20!%20cairooverlay%20name%3Doverlay%20!%20videoconvert%20!%20autovideosink%3CBR%20%2F%3E**%20Message%3A%2014%3A56%3A17.877%3A%20accl%20%3D%20cpu%3CBR%20%2F%3E%3CBR%20%2F%3E**%20(gst-launch-1.0%3A1335)%3A%20CRITICAL%20**%3A%2014%3A56%3A17.931%3A%20bb_getOutCaps%3A%20assertion%20'config-%26gt%3Binfo.info%5B0%5D.type%20%3D%3D%20_NNS_FLOAT32'%20failed%3CBR%20%2F%3E%3CBR%20%2F%3E**%20(gst-launch-1.0%3A1335)%3A%20CRITICAL%20**%3A%2014%3A56%3A17.931%3A%20bb_getOutCaps%3A%20assertion%20'config-%26gt%3Binfo.info%5B0%5D.type%20%3D%3D%20_NNS_FLOAT32'%20failed%3CBR%20%2F%3E%3CBR%20%2F%3E**%20(gst-launch-1.0%3A1335)%3A%20CRITICAL%20**%3A%2014%3A56%3A17.931%3A%20bb_getOutCaps%3A%20assertion%20'config-%26gt%3Binfo.info%5B0%5D.type%20%3D%3D%20_NNS_FLOAT32'%20failed%3CBR%20%2F%3E%3CBR%20%2F%3E**%20(gst-launch-1.0%3A1335)%3A%20CRITICAL%20**%3A%2014%3A56%3A17.935%3A%20bb_getOutCaps%3A%20assertion%20'config-%26gt%3Binfo.info%5B0%5D.type%20%3D%3D%20_NNS_FLOAT32'%20failed%3CBR%20%2F%3E%3CBR%20%2F%3E**%20(gst-launch-1.0%3A1335)%3A%20CRITICAL%20**%3A%2014%3A56%3A17.938%3A%20bb_getOutCaps%3A%20assertion%20'config-%26gt%3Binfo.info%5B0%5D.type%20%3D%3D%20_NNS_FLOAT32'%20failed%3CBR%20%2F%3E%3CBR%20%2F%3E**%20(gst-launch-1.0%3A1335)%3A%20CRITICAL%20**%3A%2014%3A56%3A17.939%3A%20bb_getOutCaps%3A%20assertion%20'config-%26gt%3Binfo.info%5B0%5D.type%20%3D%3D%20_NNS_FLOAT32'%20failed%3CBR%20%2F%3EWARNING%3A%20erroneous%20pipeline%3A%20could%20not%20set%20property%20%22mode%22%20in%20element%20%22tensor_transform%22%20to%20%22reshape%22%3CBR%20%2F%3Eroot%40imx8mpevk%3A~%23%3CBR%20%2F%3E%3CBR%20%2F%3ECould%20you%20please%20suggest%20changes%20to%20the%20pipeline%20to%20match%20the%20expected%20formats%20and%20also%20for%20overlaying%20on%20the%20nv12.%3CBR%20%2F%3E%3CBR%20%2F%3EOr%20do%20i%20need%20to%20downgrade%20or%20upgrade%20my%20linux%20bsp%20version%20to%20match%20them.%3CBR%20%2F%3E%3CBR%20%2F%3EIf%20possible%20please%20test%20the%20pipeline%20and%20provide%20back.%3CBR%20%2F%3E%3CBR%20%2F%3EThanks%20and%20Regards%3CBR%20%2F%3ES%20Vishnu%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2257223%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%20translate%3D%22no%22%3ERe%3A%20Need%20guidance%3A%20YOLOv8%20output%20shape%20mismatch%20on%20NNStreamer%20(i.MX8MP)%20%2B%20pipeline%20design%20question%20(%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2257223%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3EHello%2C%3C%2FP%3E%0A%3CP%3E%3CSPAN%3EThe%20issue%20stems%20from%20a%20mismatch%20between%20your%20YOLOv8%20TFLite%20model%20output%20shape%20(1%2C7%2C2100)%20and%20the%20format%20expected%20by%20NNStreamer's%20YOLOv8%20decoder%20on%20i.MX8MP%20(7%C3%972100%C3%971).%20This%20is%20happening%20because%3A%3CBR%20%2F%3E%3CBR%20%2F%3E1.%20Your%20current%20BSP%20version%20(LF6.12.34_2.1.0)%20supports%20transpose%20operations%20only%20on%204D%20tensors%3CBR%20%2F%3E2.%20The%20model%20output%20needs%20reshaping%20and%20transposing%20to%20match%20the%20decoder's%20expectations%3CBR%20%2F%3E%3CBR%20%2F%3E%23%23%20Recommended%20Solution%3CBR%20%2F%3E%3CBR%20%2F%3EFor%20the%20tensor%20transformation%2C%20you%20need%20to%20apply%3A%3CBR%20%2F%3E1.%20Dequantization%20(if%20using%20quantized%20model)%3CBR%20%2F%3E2.%20Reshape%20the%20output%20from%20(1%2C7%2C2100)%20to%20(1%2C7%2C2100%2C1)%20%3CBR%20%2F%3E3.%20Transpose%20the%20tensor%20to%20the%20required%20format%20(7%C3%972100%C3%971)%3CBR%20%2F%3E%3CBR%20%2F%3E%23%23%20Pipeline%20Optimization%3CBR%20%2F%3E%3CBR%20%2F%3EYour%20proposed%20architecture%20(splitting%20the%20pipeline)%20is%20a%20sound%20approach%3A%3CBR%20%2F%3E-%20Convert%20NV12%20%E2%86%92%20BGR%20only%20for%20inference%3CBR%20%2F%3E-%20Keep%20original%20NV12%2FYUY2%20frames%20for%20display%2Fencoding%3CBR%20%2F%3E-%20Overlay%20detection%20results%20directly%20on%20NV12%20using%20hardware%20acceleration%3CBR%20%2F%3E%3CBR%20%2F%3EThis%20will%20eliminate%20software%20videoconvert%20operations%20and%20achieve%20better%20performance.%3CBR%20%2F%3E%3CBR%20%2F%3E%23%23%20Recommended%20Elements%20for%20NV12%20Pipeline%3CBR%20%2F%3E%3CBR%20%2F%3EFor%20drawing%20bounding%20boxes%20directly%20on%20NV12%3A%3CBR%20%2F%3E-%20Use%20%60imxvideoconvert_g2d%60%20with%20overlay%20capability%3CBR%20%2F%3E-%20Alternative%3A%20%60cairooverlay%60%20can%20work%20with%20NV12%20but%20may%20require%20format%20adaptation%3CBR%20%2F%3E%3CBR%20%2F%3EExample%20pipeline%20structure%3A%3CBR%20%2F%3E%60%60%60%3CBR%20%2F%3Ev4l2src%20%E2%86%92%20NV12%20%E2%86%92%20tee%20%E2%86%92%20branch1%3A%20convert%20to%20BGR%20%E2%86%92%20inference%20%E2%86%92%20detection%20results%3CBR%20%2F%3Ebranch2%3A%20original%20NV12%20%E2%86%92%20imxvideoconvert_g2d%20(with%20overlay)%20%E2%86%92%20encoder%2Fdisplay%3CBR%20%2F%3E%60%60%60%3CBR%20%2F%3E%3CBR%20%2F%3EThis%20approach%20should%20significantly%20improve%20performance%20beyond%20your%20current%20~20%20FPS%20limit%2C%20leveraging%20the%20NPU's%2060%2B%20FPS%20capability%20by%20eliminating%20unnecessary%20format%20conversions.%3C%2FSPAN%3E%3C%2FP%3E%0A%3CBR%20%2F%3E%0A%3CP%3E%3CSPAN%3ERegards%3C%2FSPAN%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2258239%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%20translate%3D%22no%22%3ERe%3A%20Need%20guidance%3A%20YOLOv8%20output%20shape%20mismatch%20on%20NNStreamer%20(i.MX8MP)%20%2B%20pipeline%20design%20question%20(%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2258239%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3EHi%2C%3C%2FP%3E%0A%3CP%3EYes%2C%20you%20need%20downgrade%20your%20linux%20BSP.%20I%20guess%20with%20that%20will%20work%20because%20We%20have%20not%20yet%20tested%20pipeline%20in%20Yolov8%3C%2FP%3E%0A%3CP%3ERegards%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2258998%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%20translate%3D%22no%22%3ERe%3A%20Need%20guidance%3A%20YOLOv8%20output%20shape%20mismatch%20on%20NNStreamer%20(i.MX8MP)%20%2B%20pipeline%20design%20question%20(%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2258998%22%20slang%3D%22en-US%22%20mode%3D%22CREATE%22%3E%3CP%3EI%20have%20a%20Update%2C%20i%20have%20used%20the%20mode%20dimension%20change%20(dimchg%20)%20in%20the%20tensor_transform%20element%20to%20match%20the%20expections%20of%20the%20tensor_decoder.%26nbsp%3B%26nbsp%3B%3CBR%20%2F%3E%26nbsp%3B%3CBR%20%2F%3E....%26nbsp%3B%20!%20tensor_filter%20framework%3Dtensorflow-lite%20model%3D..%2F..%2Fvaishnavi%2Fmodel_calibrated_int8_og_320.tflite%20custom%3DDelegate%3AExternal%2CExtDelegateLib%3Alibvx_delegate.so%20!%20tensor_transform%20mode%3Darithmetic%20option%3Dtypecast%3Afloat32%2Cadd%3A128.0%2Cmul%3A0.004982381%20!%20tensor_transform%20mode%3Ddimchg%20option%3D0%3A1%20!%20tensor_decoder%20mode%3Dbounding_boxes%20option1%3Dyolov8%20option2%3Dlabels_over.txt%20option3%3D0%20option4%3D1920%3A1080%20option5%3D320%3A320%20!%20......%3CBR%20%2F%3E%3CBR%20%2F%3E%3CBR%20%2F%3E%3CBR%20%2F%3EThanks%20for%20the%20replies.%3CBR%20%2F%3E%3CBR%20%2F%3EThanks%20and%20Regards%3CBR%20%2F%3ESiddavatam%20Vishnu%3C%2FP%3E%3C%2FLINGO-BODY%3E