Hello,
I am trying to a python application for FRDM imx93 to detect all objects using the already provided ssdlite_mobilenet_v2_coco_quant_uint8_float32_no_postprocess_vela.tflite which is downloaded using the go-point demo for detection. I want to get bounding box coordinates/information as well as labels from tensor converter and decoder (using gstreamer) in my application in-order to customize the overlays via python application and do special processing also for particular detected class.
After reading the NNstremaer documentation i enabled option7 in tensor decoder in hope of getting the needed data, but i am not getting that particular data. Instead, i am getting a byte stream of size 1,228,800 bytes which is a frame of 640X480X4. When i try to visualize this frame using OpenCv (cv2), it comes out to be a dark black frame with blue overlays of bounding box and labels in places where the object is in-front of the camera.
I also tried adding option7=1 in the batch script used in detection demo, but still can't get the bounding box and labels info on terminal.
My pipeline string is as follows:
pipeline_str = (
"v4l2src name=cam_src device=/dev/video0 num-buffers=-1 ! "
"video/x-raw,width=640,height=480,framerate=30/1 ! "
"tee name=t "
"t. ! queue name=thread-nn max-size-buffers=2 leaky=2 ! "
"imxvideoconvert_pxp ! video/x-raw,width=300,height=300,format=BGR ! "
"videoconvert ! video/x-raw,format=RGB ! "
"tensor_converter ! "
"tensor_filter framework=tensorflow-lite model=/opt/gopoint-apps/downloads/ssdlite_mobilenet_v2_coco_quant_uint8_float32_no_postprocess_vela.tflite "
"custom=Delegate:External,ExtDelegateLib:libethosu_delegate.so ! "
"tensor_decoder mode=bounding_boxes "
"option1=mobilenet-ssd "
"option2=/opt/gopoint-apps/downloads/coco_labels_list.txt "
"option3=/opt/gopoint-apps/downloads/box_priors.txt:0.5:10.0:10.0:0.5:0.5:0.5 "
"option4=640:480 option5=300:300 option7=1 ! "
"appsink name=npusink emit-signals=true max-buffers=1 drop=true "
"t. ! queue name=thread-frame max-size-buffers=2 leaky=2 ! "
"videoconvert ! video/x-raw,format=BGR ! appsink name=framesink emit-signals=true max-buffers=1 drop=true"
)
pipeline = Gst.parse_launch(pipeline_str)
My sink processing functions are as follows:
def get_frame(frame_sink):
sample = frame_sink.emit("pull-sample")
if sample:
buffer=sample.get_buffer()
success,map_info =buffer.map(Gst.MapFlags.READ)
if success:
frame_data=map_info.data
buffer.unmap(map_info)
return frame_data
else:
print("No data got from FRAME SINK!")
return
def get_detections(npu_sink):
sample = npu_sink.emit("pull-sample")
if sample:
buffer=sample.get_buffer()
success,map_info =buffer.map(Gst.MapFlags.READ)
meta=Gst.Buffer.get_meta(buffer,Gst.Meta.api_type_get_tags())
print(meta)
if success:
try:
raw_data = map_info.data
print("NPU DATA type: ",type(raw_data)," len: ",len(raw_data))
#print(raw_data)
if not isinstance(raw_data,bytes):
raw_data=bytes(raw_data)
data_str=raw_data.decode("utf-8",errors="replace")
detection_data=json.load(data_str)
buffer.unmap(map_info)
return detection_data.get("bounding_boxes",[]),detection_data.get("labels",[])
"""array=np.frombuffer(map_info.data,dtype=np.float32)
buffer.unmap(map_info)
num_detections = len(array)//6
array=array.reshape((num_detections,6))
filtered = array[array[:,5]>0]
detections=[]
print(filtered)
for i in filtered:
x,y,w,h,class_id,conf = i
detections.append({"x":x,"y":y,"w":w,"h":h,"class_id":class_id,"conf":conf})
return detections"""
except Exception as e:
print("error parsing NPU data: ",e)
buffer.unmap(map_info)
else:
print("No data got from NPU SINK!")
#return {}
return [],[]
def main():
global pipeline,npusink,framesink
pipeline = start_pipeline()
#for i in range(10):
while True:
state= pipeline.get_state(Gst.CLOCK_TIME_NONE)
#print("Pipeline Status: ",state)
frame=get_frame(framesink)
bounding_boxes = get_detections(npusink)
#print("boxes: ",len(bounding_boxes))
#print(bounding_boxes)
#print("labels: ", len(labels))
if frame is not None:
#print("frame: yes. Type: ",type(frame))
#print("len: ",len(frame))
frame_np=np.frombuffer(frame,dtype=np.uint8).reshape(480,640,3)
#TODO: extract the bounding box and label information from npusink and pass the frame,boundingbox info and labels to recognize_faces() to get overlay and unique person detections
cv2.imshow("Detection Output", frame_np)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cleanup()
Please help me in this query. Also, i am having an hypothesis that the model provided in detection example is not compatible with option-7 of tensor decoder and hence i am not getting any small information, instead a complete processed overlayed black frame. If yes, can you suggest a other compatible model which can do the same thing. It would be great if you can also provide the model properties.
P.S: attaching the demo application also: