Hi all,
I have a TFlite model. Specified external delegate with python. Observe correct behavior. First warmup run takes ~46 seconds, thereafter 0.5 seconds per execution.
Converted code to C++. Specified external_delegate options with vx_delegate. Options initialize, but it still uses XNNPACK delegate when the builder and interpreter are constructed.
C++ behavior shows that vx_delegate is not being called, first warmup run and all runs thereafter take ~46 seconds, ie no hardware acceleration is occurring.
The model has float inputs and outputs. This didn't bother the python. C++ I tried specifiying
interpreter->SetAllowFp16PrecisionForFP32(true);
This caused a segmentation fault. So I have removed it.
Python code:
setup the interpreter:
self.interpreter = tflite.Interpreter(model_path='./model_integer_quant.tflite', experimental_delegates=[tflite.load_delegate('/usr/lib/libvx_delegate.so')])
self.interpreter.allocate_tensors()
self.input_details = self.interpreter.get_input_details()
self.output_details = self.interpreter.get_output_details()
# check the type of the input tensor
self.floating_model = self.input_details[0]['dtype'] == np.float32
print("self.floating_model is ", self.floating_model)
invoke the interpreter:
def detect(self, image, object=None):
interpreter = self.interpreter
input_data = self.preprocess(image)
interpreter.set_tensor(self.input_details[0]['index'], input_data)
# start_time = time.time()
interpreter.invoke()
# stop_time = time.time()
output_data = interpreter.get_tensor(self.output_details[0]['index'])
results = self.postprocess(output_data)
if object is not None:
results = [result for result in results if result['cls_name'] == object]
return results
preprocessing inputs:
def preprocess(self, image):
# load image
if type(image) == str: # Load from file path
if not os.path.isfile(image):
raise ValueError("Input image file path (", image, ") does not exist.")
image = cv.imread(image)
elif isinstance(image, np.ndarray): # Use given NumPy array
image = image.copy()
else:
raise ValueError("Invalid image input. Only file paths or a NumPy array accepted.")
self.img_height = image.shape[0]
self.img_width = image.shape[1]
# resize and padding
image = self.letterbox(image)
# BGR -> RGB
image = image[:, :, ::-1]
# image = cv.cvtColor(image, cv.COLOR_BGR2RGB)
# add N dim
input_data = np.expand_dims(image, axis=0)
if self.floating_model:
input_data = np.float32(input_data) / 255 // input data is np.float32
else:
input_data = input_data.astype(np.int8)
return input_data
Python output:
Vx delegate: allowed_builtin_code set to 0.
Vx delegate: error_during_init set to 0.
Vx delegate: error_during_prepare set to 0.
Vx delegate: error_during_invoke set to 0.
self.floating_model is True
W [HandleLayoutInfer:257]Op 19: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [HandleLayoutInfer:257]Op 19: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [HandleLayoutInfer:257]Op 19: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [HandleLayoutInfer:257]Op 19: default layout inference pass.
W [HandleLayoutInfer:257]Op 19: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
Processing ./image0frame37373.jpg - time: 47.29401898384094 s
Processing ./image0frame37954.jpg - time: 0.4757547378540039 s
Processing ./image0frame40189.jpg - time: 0.4649374485015869 s
Processing ./image0frame30.jpg - time: 0.46975016593933105 s
Processing ./image0frame937.jpg - time: 0.46828198432922363 s
Processing ./image0frame674.jpg - time: 0.46784138679504395 s
Processing ./image0frame37487.jpg - time: 0.4682295322418213 s
Processing ./image0frame36965.jpg - time: 0.475665807723999 s
Processing ./image0frame40527.jpg - time: 0.4699666500091553 s
Processing ./image0frame852.jpg - time: 0.4759962558746338 s
Processing ./result1.jpg - time: 0.46816182136535645 s
Processing ./image0frame1.jpg - time: 0.4685957431793213 s
Processing ./image0frame40183.jpg - time: 0.46500706672668457 s
Processing ./image0frame36962.jpg - time: 0.47954559326171875 s
Processing ./image0frame842.jpg - time: 0.472883939743042 s
Processing ./image0frame38968.jpg - time: 0.4709169864654541 s
Processing ./result0.jpg - time: 0.46674442291259766 s
Processing ./viper_snapshot.jpg - time: 0.4655272960662842 s
Processing ./image0frame40808.jpg - time: 0.464113712310791 s
Processing ./image0frame40814.jpg - time: 0.4668314456939697 s
Processing ./image0frame668.jpg - time: 0.46632933616638184 s
Processing ./image0frame900.jpg - time: 0.47098541259765625 s
Processing ./image0frame37379.jpg - time: 0.4749300479888916 s
Processing ./image0frame0.jpg - time: 0.46938037872314453 s
Processing ./image0frame420.jpg - time: 0.4686119556427002 s
Processing ./image0frame37956.jpg - time: 0.477811336517334 s
Processing ./image0frame38974.jpg - time: 0.4715125560760498 s
Processing ./image0frame40532.jpg - time: 0.475177526473999 s
Processing ./image0frame37484.jpg - time: 0.4749460220336914 s
Above code is working, I try to do the same thing with C++:
std::unique_ptr<tflite::FlatBufferModel> model =
tflite::FlatBufferModel::BuildFromFile("model_integer_quant.tflite");
cout << " Got tflite model " << endl;
auto ext_delegate_option = TfLiteExternalDelegateOptionsDefault("/usr/lib/libvx_delegate.so");
cout << " Ext delegate options " << endl;
auto ext_delegate_ptr = TfLiteExternalDelegateCreate(&ext_delegate_option);
cout << " Ext delegate pointer " << endl;
if(ext_delegate_ptr == nullptr){
cout << " Ext delegate is null " << endl;
return *inp;
}
tflite::ops::builtin::BuiltinOpResolver resolver;
resolver.AddCustom(kNbgCustomOp, tflite::ops::custom::Register_VSI_NPU_PRECOMPILED());
cout << " Resolver " << endl;
tflite::InterpreterBuilder builder(*model, resolver);
cout << " Builder " << endl;
std::unique_ptr<tflite::Interpreter> interpreter;
cout << " Interpreter " << endl;
//interpreter->SetAllowFp16PrecisionForFp32() //commented out because of segmentation fault
cout << " Set precision " << endl;
builder(&interpreter); //Output shows: INFO: Created TensorFlow Lite XNNPACK delegate for CPU. ????
cout << " Setup builder and intepreter " << endl;
interpreter->ModifyGraphWithDelegate(ext_delegate_ptr);
cout << " Modifying graph with delegate " << endl;
//tflite::PrintInterpreterState(interpreter.get());
interpreter->AllocateTensors();
cout << " Got model " << endl;
// get input & output layer
TfLiteTensor *input_tensor = interpreter->tensor(interpreter->inputs()[0]);
cout << " Got input " << endl;
const uint HEIGHT = input_tensor->dims->data[1];
const uint WIDTH = input_tensor->dims->data[2];
const uint CHANNEL = input_tensor->dims->data[3];
cout << "H " << HEIGHT << " W " << WIDTH << " C " << CHANNEL << endl;
// read image file
cv::Mat img;
if (inp == NULL)
{
cout << "Getting image from file " << endl;
img = cv::imread(infile);
}
else
{
cout << "Getting image from input " << endl;
img = *inp;
}
cv::Mat inputImg = mat_process(img, WIDTH, HEIGHT);
// flatten rgb image to input layer.
float *inputImg_ptr = inputImg.ptr<float>(0);
memcpy(input_tensor->data.f, inputImg.ptr<float>(0),
WIDTH * HEIGHT * CHANNEL * sizeof(float));
interpreter->Invoke();
float *output1 = interpreter->typed_output_tensor<float>(0);
C++ Output
Tensorflow Test
Reading image
IMAGE SIZE IS 281776
Reading image
IMAGE SIZE IS 348944
Got tflite model
Ext delegate options
Vx delegate: allowed_builtin_code set to 0.
Vx delegate: error_during_init set to 0.
Vx delegate: error_during_prepare set to 0.
Vx delegate: error_during_invoke set to 0.
Ext delegate pointer
Resolver
Builder
Interpreter
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Setup builder and intepreter
Modifying graph with delegate
Got model
Got input
Got output
Got output score
H 640 W 640 C 3
Getting image from input
Read matrix from file s: 0.0000083760
Creating dst
Creating dst2
Creating dst3
Creating dst4
Creating dst5
Creating dst6
Creating dst7
Got image
Process Matrix to RGB s: 0.1358635630
GOT MEMCPY
W [HandleLayoutInfer:257]Op 19: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [HandleLayoutInfer:257]Op 19: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [HandleLayoutInfer:257]Op 19: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [HandleLayoutInfer:257]Op 19: default layout inference pass.
W [HandleLayoutInfer:257]Op 19: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.