AnsweredAssumed Answered

Error when creating NN API delegate on i.MX8qmmek. "NNAPI acceleration is unsupported on this platform"

Question asked by Ullas Bharadwaj on Jun 29, 2020

Hello Community,

 

I am running AI inference on i.MX8qmmek with BSP 5.4.3_2.0.0. I have a custom TfLite application written in C++ to run inference on mobilenet/ mobilenet+SSD models. 

 

The application seems to use GPU/CPU neon acceleration as the inference times are almost 4x times using GPU compared CPU only computations.

 

However, the problem comes when I compare the "label_image" application with my custom application.

 

Model used: mobilenet 0.25 (128x128) quantized 

 

The GPU accelerated inference times are as follows:
1. "label_image" sample application - 1.6 ms

2. custom application - 11 ms

 

The CPU neon accelerated inference times are as follows:
1. "label_image" sample application - 2.7 ms

2. custom application - 56 ms

 

I cannot understand where this difference is coming from. One of my observation when using GPU acceleration is,
with the "label_image" sample application, the console shows INFO: Created TensorFlow Lite delegate for NNAPI.
Applied NNAPI delegate.invoked. However with my custom appluication, it shows INFO: Created TensorFlow Lite delegate for NNAPI. NNAPI acceleration is unsupported on this platform.

 

The code snippet I am using for this is as below:

 

unique_ptr<tflite::FlatBufferModel> model = tflite::FlatBufferModel::BuildFromFile(get_modelPath().c_str());

tflite::ops::builtin::BuiltinOpResolver resolver;

unique_ptr<tflite::Interpreter> interpreter;

tflite::InterpreterBuilder(*model.get(), resolver)(&interpreter);

interpreter->UseNNAPI(true);

interpreter->SetNumThreads(2);

TfLiteDelegatePtrMap delegates_;

auto delegate = TfLiteDelegatePtr(nullptr, [](TfLiteDelegate*) {});
if (!delegate) {
   cout << "NNAPI acceleration is unsupported on this platform.";
} else {
   delegates_.emplace("NNAPI", std::move(delegate));
}
for (const auto& delegate : delegates_) {
      if (interpreter->ModifyGraphWithDelegate(delegate.second.get()) !=
      kTfLiteOk) {
            cout << "Failed to apply " << delegate.first << " delegate.";
      } else {
            cout << "Applied " << delegate.first << " delegate.";
       }
}

interpreter->AllocateTensors();

memcpy(interpreter->typed_input_tensor<uchar>(0), resized_image.data, resized_image.total() *resized_image.elemSize());

interpreter->invoke(); //time is calculated for this function call.

 

I tried understanding the cause but no luck until now. Help is much appreciated:-)

 

Best Regards

Outcomes