<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: i.MX8 Android nnapi unable to run Int8 tflite quantized model on NPU in i.MX Processors</title>
    <link>https://community.nxp.com/t5/i-MX-Processors/i-MX8-Android-nnapi-unable-to-run-Int8-tflite-quantized-model-on/m-p/1704362#M210745</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.nxp.com/t5/user/viewprofilepage/user-id/207096"&gt;@brian14&lt;/a&gt;&amp;nbsp;- do you have any updates here? Let us know how we can move this forward. We need a resolution to this as soon as possible&lt;/P&gt;</description>
    <pubDate>Mon, 14 Aug 2023 13:04:09 GMT</pubDate>
    <dc:creator>ajechort14</dc:creator>
    <dc:date>2023-08-14T13:04:09Z</dc:date>
    <item>
      <title>i.MX8 Android nnapi unable to run Int8 tflite quantized model on NPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX8-Android-nnapi-unable-to-run-Int8-tflite-quantized-model-on/m-p/1676163#M208158</link>
      <description>&lt;P&gt;Using the Android 12.1 release on the IMX8M Plus SOM, as soon as a TFlite model uses an int8 op the nnapi forces running on CPU not NPU.&lt;/P&gt;&lt;P&gt;Is this expected behavior? TFLite will quantize/convert to int8 even if the input and output types are UINT8.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;These uint8 models run on the NPU:&lt;/P&gt;&lt;P&gt;&lt;A href="https://tfhub.dev/iree/lite-model/mobilenet_v2_100_224/uint8/1" target="_blank"&gt;https://tfhub.dev/iree/lite-model/mobilenet_v2_100_224/uint8/1&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://tfhub.dev/iree/lite-model/mobilenet_v1_100_224/uint8/1" target="_blank"&gt;https://tfhub.dev/iree/lite-model/mobilenet_v1_100_224/uint8/1&lt;/A&gt;&lt;/P&gt;&lt;P&gt;The second link is the model used in the NXP example:&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.nxp.com/docs/en/user-guide/IMX_ANDROID_TENSORFLOWLITE_USERS_GUIDE.pdf" target="_blank"&gt;https://www.nxp.com/docs/en/user-guide/IMX_ANDROID_TENSORFLOWLITE_USERS_GUIDE.pdf&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/tensorflow/examples/blob/master/lite/examples/image_classification/android/README.md" target="_blank"&gt;https://github.com/tensorflow/examples/blob/master/lite/examples/image_classification/android/README.md&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Is such an old model used because the nnapi doesn't support the ops used in newer models?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Using this:&amp;nbsp;adb shell setprop debug.nn.vlog 1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Results in this execution log:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;6-12 13:11:08.721  6953  6953 I GraphDump: digraph {: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d0 [style=filled fillcolor=black fontcolor=white label="0 = input[0]\nTQ8A(1x224x224x3)"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d1 [label="1: REF\nTQ8A(32x3x3x3)"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d2 [label="2: COPY\nTI32(32)"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d3 [label="3: COPY\nI32 = 1"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d4 [label="4: COPY\nI32 = 2"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d5 [label="5: COPY\nI32 = 2"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d6 [label="6: COPY\nI32 = 3"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d7 [label="7\nTQ8A(1x112x112x32)"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d8 [label="8: REF\nTQ8A(1x3x3x32)"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d9 [label="9: COPY\nTI32(32)"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d10 [label="10: COPY\nI32 = 1"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d11 [label="11: COPY\nI32 = 1"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d12 [label="12: COPY\nI32 = 1"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d13 [label="13: COPY\nI32 = 1"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d14 [label="14: COPY\nI32 = 3"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d15 [label="15\nTQ8A(1x112x112x32)"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d16 [label="16: REF\nTQ8A(64x1x1x32)"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d17 [label="17: REF\nTI32(64)"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d18 [label="18: COPY\nI32 = 1"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d19 [label="19: COPY\nI32 = 1"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d20 [label="20: COPY\nI32 = 1"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d21 [label="21: COPY\nI32 = 3"]: com.app.app
06-12 13:11:08.721  6953  6953 I GraphDump:     d22 [label="22\nTQ8A(1x112x112x64)"]: com.app.app
06-12 13:11:08.722  6953  6953 I GraphDump:     d23 [label="23: REF\nTQ8A(1x3x3x64)"]: com.app.app
06-12 13:11:08.722  6953  6953 I GraphDump:     d24 [label="24: REF\nTI32(64)"]: com.app.app
06-12 13:11:08.722  6953  6953 I GraphDump:     d25 [label="25: COPY\nI32 = 1"]: com.app.app


...................


06-12 13:11:08.745   414   414 I android.hardware.neuralnetworks@1.3-service-vsi-npu-server: getSupportedOperations_1_3: /vendor/bin/hw/android.hardware.neuralnetworks@1.3-service-vsi-n
06-12 13:11:08.757   414   414 I android.hardware.neuralnetworks@1.3-service-vsi-npu-server: : /vendor/bin/hw/android.hardware.neuralnetworks@1.3-service-vsi-n
06-12 13:11:08.757   414   414 I android.hardware.neuralnetworks@1.3-service-vsi-npu-server: getSupportedOperations_1_3 exit: /vendor/bin/hw/android.hardware.neuralnetworks@1.3-service-vsi-n
06-12 13:11:08.758  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:0) = 0 (vsi-npu): com.app.app
06-12 13:11:08.758  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:1) = 0 (vsi-npu): com.app.app
06-12 13:11:08.758  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:2) = 0 (vsi-npu): com.app.app
06-12 13:11:08.758  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:3) = 0 (vsi-npu): com.app.app
06-12 13:11:08.758  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:4) = 0 (vsi-npu): com.app.app
06-12 13:11:08.758  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:5) = 0 (vsi-npu): com.app.app
06-12 13:11:08.758  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:6) = 0 (vsi-npu): com.app.app
06-12 13:11:08.758  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:7) = 0 (vsi-npu): com.app.app
06-12 13:11:08.758  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:8) = 0 (vsi-npu): com.app.app
06-12 13:11:08.758  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:9) = 0 (vsi-npu): com.app.app
06-12 13:11:08.758  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:10) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:11) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:12) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:13) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:14) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:15) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:16) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:17) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:18) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:19) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:20) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:21) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:22) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:23) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:24) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:25) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:26) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(AVERAGE_POOL_2D:27) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:28) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(RESHAPE:29) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(SOFTMAX:30) = 0 (vsi-npu): com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ModelBuilder::partitionTheWork: only one best device: 0 = vsi-npu: com.app.app
06-12 13:11:08.759  6953  6953 I ExecutionPlan: ExecutionPlan::SimpleBody::finish, compilation: com.app.app&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As you can see every step executes on the vsi-npu.&lt;/P&gt;&lt;P&gt;This one does not run on NPU:&lt;/P&gt;&lt;P&gt;&lt;A href="https://tfhub.dev/google/lite-model/qat/mobilenet_v2_retinanet_256/1" target="_blank"&gt;https://tfhub.dev/google/lite-model/qat/mobilenet_v2_retinanet_256/1&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Log:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;06-13 10:07:12.543  3477  3477 I GraphDump: digraph {: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d0 [style=filled fillcolor=black fontcolor=white label="0 = input[0]\nTQ8A(1x256x256x3)"]: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d1 [label="1\nTF32(1x256x256x3)"]: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d2 [label="2\nTENSOR_QUANT8_ASYMM_SIGNED(1x256x256x3)"]: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d3 [label="3: REF\nTENSOR_QUANT8_SYMM_PER_CHANNEL(32x3x3x3)"]: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d4 [label="4: COPY\nTI32(32)"]: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d5 [label="5: COPY\nI32 = 1"]: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d6 [label="6: COPY\nI32 = 2"]: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d7 [label="7: COPY\nI32 = 2"]: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d8 [label="8: COPY\nI32 = 3"]: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d9 [label="9\nTENSOR_QUANT8_ASYMM_SIGNED(1x128x128x32)"]: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d10 [label="10: REF\nTENSOR_QUANT8_SYMM_PER_CHANNEL(1x3x3x32)"]: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d11 [label="11: COPY\nTI32(32)"]: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d12 [label="12: COPY\nI32 = 1"]: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d13 [label="13: COPY\nI32 = 1"]: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d14 [label="14: COPY\nI32 = 1"]: com.app.app
06-13 10:07:12.543  3477  3477 I GraphDump:     d15 [label="15: COPY\nI32 = 1"]: com.app.app
06-13 10:07:12.544  3477  3477 I GraphDump:     d16 [label="16: COPY\nI32 = 3"]: com.app.app
06-13 10:07:12.545  3477  3477 I GraphDump:     d17 [label="17\nTENSOR_QUANT8_ASYMM_SIGNED(1x128x128x32)"]: com.app.app
06-13 10:07:12.545  3477  3477 I GraphDump:     d18 [label="18: REF\nTENSOR_QUANT8_SYMM_PER_CHANNEL(16x1x1x32)"]: com.app.app
06-13 10:07:12.545  3477  3477 I GraphDump:     d19 [label="19: COPY\nTI32(16)"]: com.app.app
06-13 10:07:12.545  3477  3477 I GraphDump:     d20 [label="20: COPY\nI32 = 1"]: com.app.app
06-13 10:07:12.545  3477  3477 I GraphDump:     d21 [label="21: COPY\nI32 = 1"]: com.app.app
06-13 10:07:12.545  3477  3477 I GraphDump:     d22 [label="22: COPY\nI32 = 1"]: com.app.app
06-13 10:07:12.545  3477  3477 I GraphDump:     d23 [label="23: COPY\nI32 = 0"]: com.app.app
06-13 10:07:12.546  3477  3477 I GraphDump:     d24 [label="24\nTENSOR_QUANT8_ASYMM_SIGNED(1x128x128x16)"]: com.app.app
06-13 10:07:12.546  3477  3477 I GraphDump:     d25 [label="25: REF\nTENSOR_QUANT8_SYMM_PER_CHANNEL(96x1x1x16)"]: com.app.app
06-13 10:07:12.546  3477  3477 I GraphDump:     d26 [label="26: REF\nTI32(96)"]: com.app.app
06-13 10:07:12.546  3477  3477 I GraphDump:     d27 [label="27: COPY\nI32 = 1"]: com.app.app
06-13 10:07:12.546  3477  3477 I GraphDump:     d28 [label="28: COPY\nI32 = 1"]: com.app.app
06-13 10:07:12.546  3477  3477 I GraphDump:     d29 [label="29: COPY\nI32 = 1"]: com.app.app
06-13 10:07:12.546  3477  3477 I GraphDump:     d30 [label="30: COPY\nI32 = 3"]: com.app.app
06-13 10:07:12.546  3477  3477 I GraphDump:     d31 [label="31\nTENSOR_QUANT8_ASYMM_SIGNED(1x128x128x96)"]: com.app.app
06-13 10:07:12.546  3477  3477 I GraphDump:     d32 [label="32: REF\nTENSOR_QUANT8_SYMM_PER_CHANNEL(1x3x3x96)"]: com.app.app


........



06-13 10:07:13.659  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEQUANTIZE:0) = 0 (vsi-npu): com.app.app
06-13 10:07:13.659  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(QUANTIZE:1) = 0 (vsi-npu): com.app.app
06-13 10:07:13.659  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:2) = 1 (nnapi-reference): com.app.app
06-13 10:07:13.659  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:3) = 1 (nnapi-reference): com.app.app
06-13 10:07:13.660  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:4) = 1 (nnapi-reference): com.app.app
06-13 10:07:13.660  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:5) = 1 (nnapi-reference): com.app.app
06-13 10:07:13.660  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:6) = 1 (nnapi-reference): com.app.app
06-13 10:07:13.660  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:7) = 1 (nnapi-reference): com.app.app
06-13 10:07:13.660  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:8) = 1 (nnapi-reference): com.app.app
06-13 10:07:13.660  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:9) = 1 (nnapi-reference): com.app.app
06-13 10:07:13.660  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:10) = 1 (nnapi-reference): com.app.app
06-13 10:07:13.660  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(ADD:11) = 1 (nnapi-reference): com.app.app
06-13 10:07:13.660  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:12) = 1 (nnapi-reference): com.app.app
06-13 10:07:13.660  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:13) = 1 (nnapi-reference): com.app.app
06-13 10:07:13.660  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:14) = 1 (nnapi-reference): com.app.app
06-13 10:07:13.660  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:15) = 1 (nnapi-reference): com.app.app
06-13 10:07:13.660  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(DEPTHWISE_CONV_2D:16) = 1 (nnapi-reference): com.app.app
06-13 10:07:13.660  3477  3477 I ExecutionPlan: ModelBuilder::findBestDeviceForEachOperation(CONV_2D:17) = 1 (nnapi-reference): com.app.app&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have also tested this with 224x224 int8 signed models, they also run on the CPU.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It seems like only TQ8A is supported not&amp;nbsp;TENSOR_QUANT8_ASYMM_SIGNED? Or is there some other reason the model is not running on the NPU? Looking at the NXP documentation it makes no mention that only UINT8 ops are supported in the nnapi, it seems like that should be noted somewhere as it seems newer TFLite quantizes to INT8 not UINT8.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help or insight would be much appreciated, thank you.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Jun 2023 16:24:49 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX8-Android-nnapi-unable-to-run-Int8-tflite-quantized-model-on/m-p/1676163#M208158</guid>
      <dc:creator>drewg</dc:creator>
      <dc:date>2023-06-26T16:24:49Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX8 Android nnapi unable to run Int8 tflite quantized model on NPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX8-Android-nnapi-unable-to-run-Int8-tflite-quantized-model-on/m-p/1679090#M208387</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.nxp.com/t5/user/viewprofilepage/user-id/213723"&gt;@drewg&lt;/a&gt;,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have been working on this case using the benchmark_model for Android as described on the i.MX TensorFlow Lite on Android User's Guide.&lt;BR /&gt;Here are the benchmarks.&lt;BR /&gt;&lt;BR /&gt;First, I tried to get benchmarks for Android with the model lite-model_qat_mobilenet_v2_retinanet_256_1.tflite, that you are trying to accelerate on the NPU.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Benchmark for CPU with 4 threads&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Command:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;./benchmark_model --graph=lite-model_qat_mobilenet_v2_retinanet_256 1.tflite --num_threads=4&lt;/LI-CODE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Brian_Ibarra_0-1688086066226.png" style="width: 673px;"&gt;&lt;img src="https://community.nxp.com/t5/image/serverpage/image-id/230151i280DAA9F06B5A0FF/image-dimensions/673x208?v=v2" width="673" height="208" role="button" title="Brian_Ibarra_0-1688086066226.png" alt="Brian_Ibarra_0-1688086066226.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;With the above log you can see the average inference time is &lt;EM&gt;&lt;STRONG&gt;187487us&lt;/STRONG&gt;&lt;/EM&gt; or &lt;EM&gt;&lt;STRONG&gt;187.5ms&lt;/STRONG&gt;&lt;/EM&gt;.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Benchmark for NPU&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Command:&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;./benchmark_model --graph=lite-model_qat_mobilenet_v2_retinanet_256_1.tflite --use_nnapi=true --nnapi_accelerator_name=vsi-npu&lt;/LI-CODE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Brian_Ibarra_1-1688086104223.png" style="width: 676px;"&gt;&lt;img src="https://community.nxp.com/t5/image/serverpage/image-id/230152iC623E8F15E2CAF32/image-dimensions/676x239?v=v2" width="676" height="239" role="button" title="Brian_Ibarra_1-1688086104223.png" alt="Brian_Ibarra_1-1688086104223.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;You can see the difference with the NPU average inference time is &lt;EM&gt;&lt;STRONG&gt;355503us&lt;/STRONG&gt;&lt;/EM&gt; or &lt;EM&gt;&lt;STRONG&gt;355.5ms&lt;/STRONG&gt;&lt;/EM&gt;.&lt;/P&gt;
&lt;P&gt;Command:&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;./benchmark_model --graph=lite-model_qat_mobilenet_v2_retinanet_256_1.tflite --use_nnapi=true --nnapi_accelerator_name=vsi-npu --enable_op_profiling=true &lt;/LI-CODE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Brian_Ibarra_2-1688086172367.png" style="width: 674px;"&gt;&lt;img src="https://community.nxp.com/t5/image/serverpage/image-id/230153i56E540B8810E3755/image-dimensions/674x369?v=v2" width="674" height="369" role="button" title="Brian_Ibarra_2-1688086172367.png" alt="Brian_Ibarra_2-1688086172367.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Also, I enabled the profiling option, you can see the nodes and their implementation on CPU and NPU.&lt;BR /&gt;&lt;BR /&gt;With this information, we can see that the model &lt;STRONG&gt;can run on the NPU&lt;/STRONG&gt; but with a bad performance (CPU 187.5ms - NPU 355.5ms) related to the Android Ecosystem for TensorFlow Lite.&lt;BR /&gt;&lt;BR /&gt;I will work on this issue and update you as soon as possible.&lt;BR /&gt;&lt;BR /&gt;Have a great day!&lt;/P&gt;</description>
      <pubDate>Fri, 30 Jun 2023 00:52:33 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX8-Android-nnapi-unable-to-run-Int8-tflite-quantized-model-on/m-p/1679090#M208387</guid>
      <dc:creator>brian14</dc:creator>
      <dc:date>2023-06-30T00:52:33Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX8 Android nnapi unable to run Int8 tflite quantized model on NPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX8-Android-nnapi-unable-to-run-Int8-tflite-quantized-model-on/m-p/1704360#M210744</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.nxp.com/t5/user/viewprofilepage/user-id/207096"&gt;@brian14&lt;/a&gt;&amp;nbsp;- do you have any updates here? Let us know what can be done to move this forward as we need a resolution here as soon as possible&lt;/P&gt;</description>
      <pubDate>Mon, 14 Aug 2023 13:03:09 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX8-Android-nnapi-unable-to-run-Int8-tflite-quantized-model-on/m-p/1704360#M210744</guid>
      <dc:creator>ajechort14</dc:creator>
      <dc:date>2023-08-14T13:03:09Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX8 Android nnapi unable to run Int8 tflite quantized model on NPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX8-Android-nnapi-unable-to-run-Int8-tflite-quantized-model-on/m-p/1704362#M210745</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.nxp.com/t5/user/viewprofilepage/user-id/207096"&gt;@brian14&lt;/a&gt;&amp;nbsp;- do you have any updates here? Let us know how we can move this forward. We need a resolution to this as soon as possible&lt;/P&gt;</description>
      <pubDate>Mon, 14 Aug 2023 13:04:09 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX8-Android-nnapi-unable-to-run-Int8-tflite-quantized-model-on/m-p/1704362#M210745</guid>
      <dc:creator>ajechort14</dc:creator>
      <dc:date>2023-08-14T13:04:09Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX8 Android nnapi unable to run Int8 tflite quantized model on NPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX8-Android-nnapi-unable-to-run-Int8-tflite-quantized-model-on/m-p/1704473#M210759</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.nxp.com/t5/user/viewprofilepage/user-id/221660"&gt;@ajechort14&lt;/a&gt;,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This issue was escalated to our internal team and here is the answer:&lt;/P&gt;
&lt;P&gt;"It is confirmed that this is a limitation with Android’s NNAPI delegate, not an issue with the model and/or the OS.&lt;/P&gt;
&lt;P&gt;This is an explanation from official documentation: &lt;A href="https://www.tensorflow.org/lite/android/delegates/nnapi" target="_blank"&gt;https://www.tensorflow.org/lite/android/delegates/nnapi&lt;/A&gt;&lt;BR /&gt;If the NNAPI delegate does not support some of the ops or parameter combinations in a model, the framework only runs the supported parts of the graph on the accelerator. The remainder runs on the CPU, which results in split execution. Due to the high cost of CPU/accelerator synchronization, this may result in slower performance than executing the whole network on the CPU alone."&lt;/P&gt;
&lt;P&gt;Have a great day!&lt;/P&gt;</description>
      <pubDate>Mon, 14 Aug 2023 18:14:34 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX8-Android-nnapi-unable-to-run-Int8-tflite-quantized-model-on/m-p/1704473#M210759</guid>
      <dc:creator>brian14</dc:creator>
      <dc:date>2023-08-14T18:14:34Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX8 Android nnapi unable to run Int8 tflite quantized model on NPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX8-Android-nnapi-unable-to-run-Int8-tflite-quantized-model-on/m-p/1715134#M211811</link>
      <description>&lt;P&gt;Hello - we are still having trouble getting our model to run on the npu. can we schedule a call with you or some people from NXP to discuss?&lt;/P&gt;</description>
      <pubDate>Thu, 31 Aug 2023 14:11:19 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX8-Android-nnapi-unable-to-run-Int8-tflite-quantized-model-on/m-p/1715134#M211811</guid>
      <dc:creator>ajechort14</dc:creator>
      <dc:date>2023-08-31T14:11:19Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX8 Android nnapi unable to run Int8 tflite quantized model on NPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX8-Android-nnapi-unable-to-run-Int8-tflite-quantized-model-on/m-p/1715284#M211823</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.nxp.com/t5/user/viewprofilepage/user-id/221660"&gt;@ajechort14&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Sorry, but we don't offer this service.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there a specific reason to run your model through Android platform?&lt;/P&gt;
&lt;P&gt;We tested this model on our BSP with a correct performance.&lt;/P&gt;
&lt;P&gt;Thank you and have a wonderful day!&lt;/P&gt;</description>
      <pubDate>Thu, 31 Aug 2023 17:26:54 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX8-Android-nnapi-unable-to-run-Int8-tflite-quantized-model-on/m-p/1715284#M211823</guid>
      <dc:creator>brian14</dc:creator>
      <dc:date>2023-08-31T17:26:54Z</dc:date>
    </item>
  </channel>
</rss>

