Error when creating NN API delegate on i.MX8qmmek. "NNAPI acceleration is unsupported on this platform"

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Error when creating NN API delegate on i.MX8qmmek. "NNAPI acceleration is unsupported on this platform"

3,867 Views
ullasbharadwaj
Contributor III

Hello Community,

I am running AI inference on i.MX8qmmek with BSP 5.4.3_2.0.0. I have a custom TfLite application written in C++ to run inference on mobilenet/ mobilenet+SSD models. 

The application seems to use GPU/CPU neon acceleration as the inference times are almost 4x times using GPU compared CPU only computations.

However, the problem comes when I compare the "label_image" application with my custom application.

Model used: mobilenet 0.25 (128x128) quantized 

The GPU accelerated inference times are as follows:
1. "label_image" sample application - 1.6 ms

2. custom application - 11 ms

The CPU neon accelerated inference times are as follows:
1. "label_image" sample application - 2.7 ms

2. custom application - 56 ms

I cannot understand where this difference is coming from. One of my observation when using GPU acceleration is,
with the "label_image" sample application, the console shows INFO: Created TensorFlow Lite delegate for NNAPI.
Applied NNAPI delegate.invoked. However with my custom appluication, it shows INFO: Created TensorFlow Lite delegate for NNAPI. NNAPI acceleration is unsupported on this platform.

The code snippet I am using for this is as below:

unique_ptr<tflite::FlatBufferModel> model = tflite::FlatBufferModel::BuildFromFile(get_modelPath().c_str());

tflite::ops::builtin::BuiltinOpResolver resolver;

unique_ptr<tflite::Interpreter> interpreter;

tflite::InterpreterBuilder(*model.get(), resolver)(&interpreter);

interpreter->UseNNAPI(true);

interpreter->SetNumThreads(2);

TfLiteDelegatePtrMap delegates_;

auto delegate = TfLiteDelegatePtr(nullptr, [](TfLiteDelegate*) {});
if (!delegate) {
   cout << "NNAPI acceleration is unsupported on this platform.";
} else {
   delegates_.emplace("NNAPI", std::move(delegate));
}
for (const auto& delegate : delegates_) {
      if (interpreter->ModifyGraphWithDelegate(delegate.second.get()) !=
      kTfLiteOk) {
            cout << "Failed to apply " << delegate.first << " delegate.";
      } else {
            cout << "Applied " << delegate.first << " delegate.";
       }
}

interpreter->AllocateTensors();

memcpy(interpreter->typed_input_tensor<uchar>(0), resized_image.data, resized_image.total() *resized_image.elemSize());

interpreter->invoke(); //time is calculated for this function call.

I tried understanding the cause but no luck until now. Help is much appreciated:-)

Best Regards

Labels (1)
7 Replies

3,420 Views
Marco_Zaccheria
NXP Employee
NXP Employee

Hi Ullas,

I see you are using the following code:

auto delegate = TfLiteDelegatePtr(nullptr, [](TfLiteDelegate*) {});

That is actually the code that fails (you actually get a NULL pointer).

Have you actually tried to allocate a new TfLiteDelegate instead?

Thanks

   Marco

3,420 Views
ullasbharadwaj
Contributor III

Hello Marco,

Thanks for your reply.

You are right, I used auto delegate = TfLiteDelegatePtr(tflite::NnApiDelegate(), [](TfLiteDelegate*) {});

However, the performance in terms of inference times (for interpreter->invoke())  is reduced drastically for non-quanitized SSD models compared to the previous BSP version (5.4.3_2.0.0) where I just used useNNAPI(true). So there is something  wrong in the below code of creating a delegate? Can you please help how you are enabling it?

Here is the summary....

1. BSP Version: 5.4.3_2.0.0

      Enabling Acceleration : interpreter->UseNNAPI(true)

      Result: All models run perfectly fine

2. BSP Version: 5.4.24_2_1.0

      Enabling Acceleration :  Using below code

using TfLiteDelegatePtr = tflite::Interpreter::TfLiteDelegatePtr;
using TfLiteDelegatePtrMap = std::map<std::string, TfLiteDelegatePtr>;

TfLiteDelegatePtrMap delegates_;

auto delegate = TfLiteDelegatePtr(tflite::NnApiDelegate(), [](TfLiteDelegate*) {});

if (!delegate) {
cout << "NNAPI acceleration is unsupported on this platform.";
} else {
delegates_.emplace("NNAPI", std::move(delegate));
}
for (const auto& delegate : delegates_) {
if (interpreter->ModifyGraphWithDelegate(delegate.second.get()) !=
kTfLiteOk) {
cout << "Failed to apply " << delegate.first << " delegate.";
} else {
cout << "Applied " << delegate.first << " delegate.";
}
}

Result: SSD models,especially non quantized models take drastically long inference times. Ex: SSD Mobilenet v2 Coco takes > 500 ms

Best Regards

Ullas Bharadwaj

0 Kudos

3,420 Views
Marco_Zaccheria
NXP Employee
NXP Employee

Hi Ullas,

in the example mentioned, the delegate is actually used only to figure out what acceleration is available on the platform the example is running on.

We will figure out why the code is not running, but for the sake of your test you can avoid using the delegate, you can just do something like:

interpreter->UseNNAPI(get_useGPU());

Please let me know whether it resolves you issue.

Again, we are also trying to figure out whether we need to modify the original code.

Thank you

Best Regards

   Marco

3,420 Views
ullasbharadwaj
Contributor III

Hello Marco,

Thanks for the reply.

Yes, I was doing as you suggested on BSP 5.4.3_2.0.0 and it worked fine. Only in 5.4.24_2.1.0, I get the error "UseNNAPI is not supported. Use ModifyGraphWithDelegate instead.".

( I am currently not with the Target, so cannot exact ordering of words maybe wrong)

Best Regards

Ullas Bharadwaj

0 Kudos

3,420 Views
Marco_Zaccheria
NXP Employee
NXP Employee

Hi Ullas,

could you please send me the full log showing the issue with 5.4.24?

Furthermore, have you already checked the up-to-date version of the label_image example?

Thank you

   Marco

0 Kudos

3,420 Views
manish_bajaj
NXP Employee
NXP Employee

nxf60449‌,

Can you check this and update the ticket?

-Manish

0 Kudos

3,420 Views
Alifer_Moraes
NXP Employee
NXP Employee

Hello manishbajaj‌,

Sure, I'll take a look and let you know

Alifer