TFLite FullyConnected Layers on iMX8MPlus using NNAPI

We are using the iMX8MPlus module and we are trying to deploy a self-developed tensorflow-lite model on the NPU.

The tf-lite version we are using is the lf-5.15.5_1.0.0 from We are trying to use the NNAPI delegate to run the model on the NPU, sofar it works for all the convolutional layers, but the FULLY_CONNECTED layers are still having issues.

The error is shows is: Node 0 Operator Builtin Code 9 FULLY_CONNECTED (not delegated) and digging down the source code a bit deeper, we found it was failing in line if (builtin->keep_num_dims) of the file

    case kTfLiteBuiltinFullyConnected: {
      ExpectMaxOpVersion(version, 5, &val_ctx);
      const auto output_type = context->tensors[node->outputs->data[0]].type;
      Expect(output_type != kTfLiteInt16,
             "Unsupported output of type kTfLiteInt16", &val_ctx);
      if (android_sdk_version < kMinSdkVersionForNNAPI12) {
        Expect(!IsHybridOperator(context, builtin_code, node),
               "Hybrid operators not supported before NNAPI 1.2", &val_ctx);
        ExpectIsFloatOrUint8Operator(context, node, &val_ctx);
      const auto input_type = context->tensors[node->inputs->data[0]].type;
      if (android_sdk_version < kMinSdkVersionForNNAPI12 &&
          input_type == kTfLiteUInt8) {
        ExpectIsRestrictedScalesCompliant(context, node, &val_ctx);
      auto builtin =
      if (builtin->keep_num_dims) {
                                   kMinSdkVersionForNNAPI13, &val_ctx);
    } break;

So if the flag keep_num_dims is true, then it requires a android sdk version of >= 30, whereas in this version of tf-lite (the NXP customized version) the android_sdk version is defined as 29.

We have been trying to figure out how to set the keep_num_dims flag in the model we are developing, however we cannot find much resource about it. Is there  any experience of someone else who has already faced such issue before?

