Hi all,
I’ve tried to use SSD Mobilenet V2 .tflite(trained/converted through Object Detection API) on IMX8MP EVK.
Here is benchmark_model result of the .tflite
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Operator-wise Profiling Info for Regular Benchmark Runs:
============================== Run Order ==============================
[node type] | [start] | [first] | [avg ms] | [%] | [cdf%] | [mem KB] | [times called] |
QUANTIZE | 0 | 0.615 | 0.607 | 2.66% | 2.66% | 0 | 1 |
TfLiteNnapiDelegate | 0.607 | 18.077 | 17.979 | 78.89% | 81.56% | 0 | 1 |
DEQUANTIZE | 18.587 | 0.025 | 0.025 | 0.11% | 81.67% | 0 | 1 |
DEQUANTIZE | 18.613 | 0.473 | 0.474 | 2.08% | 83.75% | 0 | 1 |
TFLite_Detection_PostProcess | 19.087 | 3.746 | 3.704 | 16.25% | 100.00% | 0 | 1 |
============================== Top by Computation Time ==============================
[node type] | [start] | [first] | [avg ms] | [%] | [cdf%] | [mem KB] | [times called] |
TfLiteNnapiDelegate | 0.607 | 18.077 | 17.979 | 78.89% | 78.89% | 0 | 1 |
TFLite_Detection_PostProcess | 19.087 | 3.746 | 3.704 | 16.25% | 95.15% | 0 | 1 |
QUANTIZE | 0 | 0.615 | 0.607 | 2.66% | 97.81% | 0 | 1 |
DEQUANTIZE | 18.613 | 0.473 | 0.474 | 2.08% | 99.89% | 0 | 1 |
DEQUANTIZE | 18.587 | 0.025 | 0.025 | 0.11% | 100.00% | 0 | 1 |
Number of nodes executed: 5
============================== Summary by node type ==============================
[Node type] | [count] | [avg ms] | [avg %] | [cdf %] | [mem KB] | [times called] |
TfLiteNnapiDelegate | 1 | 17.978 | 78.90% | 78.90% | 0 | 1 |
TFLite_Detection_PostProcess | 1 | 3.704 | 16.26% | 95.15% | 0 | 1 |
QUANTIZE | 1 | 0.606 | 2.66% | 97.81% | 0 | 1 |
DEQUANTIZE | 2 | 0.499 | 2.19% | 100.00% | 0 | 2 |
Timings (microseconds): count=50 first=22936 curr=22800 min=22694 max=22972 avg=22789 std=58
Memory (bytes): count=0
5 nodes observed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It seems that Post-Process are executed on not NPU(NNAPI) but CPU core.
If so, I assume this behavior differs from Processor Reference Manual which explains PPU in NPU should execute Post-Process(Non-max Suppression).
Also I can see unexpected processing delay in Post-Process on on IMX8MP EVK now.
So I need countermeasure for this behavior.
Thanks in advance.
Best Regards,
Solved! Go to Solution.
Hi, all
Sorry for double post.
This post on the left was judged to be spam, so I posted it twice for testing.
Hi, all
Sorry for double post.
This post on the left was judged to be spam, so I posted it twice for testing.