Error when creating NN API delegate on i.MX8qmmek. "NNAPI acceleration is unsupported on this platform"

ullasbharadwaj · ‎06-29-2020

Hello Community,

I am running AI inference on i.MX8qmmek with BSP 5.4.3_2.0.0. I have a custom TfLite application written in C++ to run inference on mobilenet/ mobilenet+SSD models.

The application seems to use GPU/CPU neon acceleration as the inference times are almost 4x times using GPU compared CPU only computations.

However, the problem comes when I compare the "label_image" application with my custom application.

Model used: mobilenet 0.25 (128x128) quantized

The GPU accelerated inference times are as follows:
1. "label_image" sample application - 1.6 ms

2. custom application - 11 ms

The CPU neon accelerated inference times are as follows:
1. "label_image" sample application - 2.7 ms

2. custom application - 56 ms

I cannot understand where this difference is coming from. One of my observation when using GPU acceleration is,
with the "label_image" sample application, the console shows INFO: Created TensorFlow Lite delegate for NNAPI.
Applied NNAPI delegate.invoked. However with my custom appluication, it shows INFO: Created TensorFlow Lite delegate for NNAPI. NNAPI acceleration is unsupported on this platform.

The code snippet I am using for this is as below:

unique_ptr<tflite::FlatBufferModel> model = tflite::FlatBufferModel::BuildFromFile(get_modelPath().c_str());

tflite::ops::builtin::BuiltinOpResolver resolver;

unique_ptr<tflite::Interpreter> interpreter;

tflite::InterpreterBuilder(*model.get(), resolver)(&interpreter);

interpreter->UseNNAPI(true);

interpreter->SetNumThreads(2);

TfLiteDelegatePtrMap delegates_;

auto delegate = TfLiteDelegatePtr(nullptr, [](TfLiteDelegate*) {});
if (!delegate) {
   cout << "NNAPI acceleration is unsupported on this platform.";
} else {
   delegates_.emplace("NNAPI", std::move(delegate));
}
for (const auto& delegate : delegates_) {
      if (interpreter->ModifyGraphWithDelegate(delegate.second.get()) !=
      kTfLiteOk) {
            cout << "Failed to apply " << delegate.first << " delegate.";
      } else {
            cout << "Applied " << delegate.first << " delegate.";
       }
}

interpreter->AllocateTensors();

memcpy(interpreter->typed_input_tensor<uchar>(0), resized_image.data, resized_image.total() *resized_image.elemSize());

interpreter->invoke(); //time is calculated for this function call.

I tried understanding the cause but no luck until now. Help is much appreciated:-)

Best Regards

Marco_Zaccheria · ‎07-15-2020

Hi Ullas,

I see you are using the following code:

auto delegate = TfLiteDelegatePtr(nullptr, [](TfLiteDelegate*) {});

That is actually the code that fails (you actually get a NULL pointer).

Have you actually tried to allocate a new TfLiteDelegate instead?

Thanks

Marco

ullasbharadwaj · ‎07-15-2020

Hello Marco,

Thanks for your reply.

You are right, I used auto delegate = TfLiteDelegatePtr(tflite::NnApiDelegate(), [](TfLiteDelegate*) {});

However, the performance in terms of inference times (for interpreter->invoke()) is reduced drastically for non-quanitized SSD models compared to the previous BSP version (5.4.3_2.0.0) where I just used useNNAPI(true). So there is something wrong in the below code of creating a delegate? Can you please help how you are enabling it?

Here is the summary....

1. BSP Version: 5.4.3_2.0.0

Enabling Acceleration : interpreter->UseNNAPI(true)

Result: All models run perfectly fine

2. BSP Version: 5.4.24_2_1.0

Enabling Acceleration : Using below code

using TfLiteDelegatePtr = tflite::Interpreter::TfLiteDelegatePtr;
using TfLiteDelegatePtrMap = std::map<std::string, TfLiteDelegatePtr>;

TfLiteDelegatePtrMap delegates_;

auto delegate = TfLiteDelegatePtr(tflite::NnApiDelegate(), [](TfLiteDelegate*) {});

if (!delegate) {
cout << "NNAPI acceleration is unsupported on this platform.";
} else {
delegates_.emplace("NNAPI", std::move(delegate));
}
for (const auto& delegate : delegates_) {
if (interpreter->ModifyGraphWithDelegate(delegate.second.get()) !=
kTfLiteOk) {
cout << "Failed to apply " << delegate.first << " delegate.";
} else {
cout << "Applied " << delegate.first << " delegate.";
}
}

Result: SSD models,especially non quantized models take drastically long inference times. Ex: SSD Mobilenet v2 Coco takes > 500 ms

Best Regards

Ullas Bharadwaj

Marco_Zaccheria · ‎07-23-2020

Hi Ullas,

in the example mentioned, the delegate is actually used only to figure out what acceleration is available on the platform the example is running on.

We will figure out why the code is not running, but for the sake of your test you can avoid using the delegate, you can just do something like:

interpreter->UseNNAPI(get_useGPU());‍

Please let me know whether it resolves you issue.

Again, we are also trying to figure out whether we need to modify the original code.

Thank you

Best Regards

Marco

ullasbharadwaj · ‎07-23-2020

Hello Marco,

Thanks for the reply.

Yes, I was doing as you suggested on BSP 5.4.3_2.0.0 and it worked fine. Only in 5.4.24_2.1.0, I get the error "UseNNAPI is not supported. Use ModifyGraphWithDelegate instead.".

( I am currently not with the Target, so cannot exact ordering of words maybe wrong)

Best Regards

Ullas Bharadwaj

Marco_Zaccheria · ‎07-29-2020

Hi Ullas,

could you please send me the full log showing the issue with 5.4.24?

Furthermore, have you already checked the up-to-date version of the label_image example?

Thank you

Marco

manish_bajaj · ‎07-17-2020

nxf60449‌,

Can you check this and update the ticket?

-Manish

Alifer_Moraes · ‎07-20-2020

Hello manishbajaj‌,

Sure, I'll take a look and let you know

Alifer

Error when creating NN API delegate on i.MX8qmmek. "NNAPI acceleration is unsupported on this platform"

Error when creating NN API delegate on i.MX8qmmek. "NNAPI acceleration is unsupported on this platform"

i.MX 8