Unable to get bounding box and label information when using tflite model for detection on imx93 FRDM

Aditya_Vashista · ‎03-27-2025

Hello,

I am trying to a python application for FRDM imx93 to detect all objects using the already provided ssdlite_mobilenet_v2_coco_quant_uint8_float32_no_postprocess_vela.tflite which is downloaded using the go-point demo for detection. I want to get bounding box coordinates/information as well as labels from tensor converter and decoder (using gstreamer) in my application in-order to customize the overlays via python application and do special processing also for particular detected class.

After reading the NNstremaer documentation i enabled option7 in tensor decoder in hope of getting the needed data, but i am not getting that particular data. Instead, i am getting a byte stream of size 1,228,800 bytes which is a frame of 640X480X4. When i try to visualize this frame using OpenCv (cv2), it comes out to be a dark black frame with blue overlays of bounding box and labels in places where the object is in-front of the camera.

I also tried adding option7=1 in the batch script used in detection demo, but still can't get the bounding box and labels info on terminal.

My pipeline string is as follows:

pipeline_str = (
"v4l2src name=cam_src device=/dev/video0 num-buffers=-1 ! "
"video/x-raw,width=640,height=480,framerate=30/1 ! "
"tee name=t "

"t. ! queue name=thread-nn max-size-buffers=2 leaky=2 ! "
"imxvideoconvert_pxp ! video/x-raw,width=300,height=300,format=BGR ! "
"videoconvert ! video/x-raw,format=RGB ! "
"tensor_converter ! "
"tensor_filter framework=tensorflow-lite model=/opt/gopoint-apps/downloads/ssdlite_mobilenet_v2_coco_quant_uint8_float32_no_postprocess_vela.tflite "
"custom=Delegate:External,ExtDelegateLib:libethosu_delegate.so ! "
"tensor_decoder mode=bounding_boxes "
"option1=mobilenet-ssd "
"option2=/opt/gopoint-apps/downloads/coco_labels_list.txt "
"option3=/opt/gopoint-apps/downloads/box_priors.txt:0.5:10.0:10.0:0.5:0.5:0.5 "
"option4=640:480 option5=300:300 option7=1 ! "
"appsink name=npusink emit-signals=true max-buffers=1 drop=true "

"t. ! queue name=thread-frame max-size-buffers=2 leaky=2 ! "
"videoconvert ! video/x-raw,format=BGR ! appsink name=framesink emit-signals=true max-buffers=1 drop=true"
)
pipeline = Gst.parse_launch(pipeline_str)

My sink processing functions are as follows:

def get_frame(frame_sink):
    sample = frame_sink.emit("pull-sample")
    if sample:
        buffer=sample.get_buffer()
        success,map_info =buffer.map(Gst.MapFlags.READ)
        if success:
           frame_data=map_info.data
           buffer.unmap(map_info)
           return frame_data
    else:
       print("No data got from FRAME SINK!")
    return
def get_detections(npu_sink):
    sample = npu_sink.emit("pull-sample")
    if sample:
        buffer=sample.get_buffer()
        success,map_info =buffer.map(Gst.MapFlags.READ)
        meta=Gst.Buffer.get_meta(buffer,Gst.Meta.api_type_get_tags())
        print(meta)
        if success:
           try:
               
               raw_data = map_info.data
               print("NPU DATA type: ",type(raw_data)," len: ",len(raw_data))
               #print(raw_data)
               if not isinstance(raw_data,bytes):
                   raw_data=bytes(raw_data)
               data_str=raw_data.decode("utf-8",errors="replace")
               detection_data=json.load(data_str)
               buffer.unmap(map_info)
               return detection_data.get("bounding_boxes",[]),detection_data.get("labels",[])
               """array=np.frombuffer(map_info.data,dtype=np.float32)
               buffer.unmap(map_info)
               num_detections = len(array)//6
               array=array.reshape((num_detections,6))
               filtered = array[array[:,5]>0]
               detections=[]
               print(filtered)
               for i in filtered:
                   x,y,w,h,class_id,conf = i
                   detections.append({"x":x,"y":y,"w":w,"h":h,"class_id":class_id,"conf":conf})
               return detections"""
           except Exception as e:
               print("error parsing NPU data: ",e)
               buffer.unmap(map_info)
    else:
       print("No data got from NPU SINK!")
    #return {}
    return [],[]

def main():
    global pipeline,npusink,framesink

    pipeline = start_pipeline()
    #for i in range(10):
    while True:
        state= pipeline.get_state(Gst.CLOCK_TIME_NONE)
        #print("Pipeline Status: ",state)
        frame=get_frame(framesink)
        bounding_boxes = get_detections(npusink)
        #print("boxes: ",len(bounding_boxes))
        #print(bounding_boxes)
        #print("labels: ", len(labels))

        
        if frame is not None:
            #print("frame: yes. Type: ",type(frame))
            #print("len: ",len(frame))
            frame_np=np.frombuffer(frame,dtype=np.uint8).reshape(480,640,3)
            #TODO: extract the bounding box and label information from npusink and pass the frame,boundingbox info and labels to recognize_faces() to get overlay and unique person detections
            
            cv2.imshow("Detection Output", frame_np)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cleanup()

Please help me in this query. Also, i am having an hypothesis that the model provided in detection example is not compatible with option-7 of tensor decoder and hence i am not getting any small information, instead a complete processed overlayed black frame. If yes, can you suggest a other compatible model which can do the same thing. It would be great if you can also provide the model properties.

P.S: attaching the demo application also:

Unable to get bounding box and label information when using tflite model for detection on imx93 FRDM

Unable to get bounding box and label information when using tflite model for detection on imx93 FRDM

FRDM-Training

Graphics & Display

Hands-On Training

Linux

Yocto Project