Hi all,
I’ve tried to use SSD Mobilenet V2 .tflite(trained/converted through Object Detection API) on IMX8MP EVK.
Here is benchmark_model result of the .tflite
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Operator-wise Profiling Info for Regular Benchmark Runs:
============================== Run Order ==============================
[node type] | [start] | [first] | [avg ms] | [%] | [cdf%] | [mem KB] | [times called] |
QUANTIZE | 0 | 0.615 | 0.607 | 2.66% | 2.66% | 0 | 1 |
TfLiteNnapiDelegate | 0.607 | 18.077 | 17.979 | 78.89% | 81.56% | 0 | 1 |
DEQUANTIZE | 18.587 | 0.025 | 0.025 | 0.11% | 81.67% | 0 | 1 |
DEQUANTIZE | 18.613 | 0.473 | 0.474 | 2.08% | 83.75% | 0 | 1 |
TFLite_Detection_PostProcess | 19.087 | 3.746 | 3.704 | 16.25% | 100.00% | 0 | 1 |
============================== Top by Computation Time ==============================
[node type] | [start] | [first] | [avg ms] | [%] | [cdf%] | [mem KB] | [times called] |
TfLiteNnapiDelegate | 0.607 | 18.077 | 17.979 | 78.89% | 78.89% | 0 | 1 |
TFLite_Detection_PostProcess | 19.087 | 3.746 | 3.704 | 16.25% | 95.15% | 0 | 1 |
QUANTIZE | 0 | 0.615 | 0.607 | 2.66% | 97.81% | 0 | 1 |
DEQUANTIZE | 18.613 | 0.473 | 0.474 | 2.08% | 99.89% | 0 | 1 |
DEQUANTIZE | 18.587 | 0.025 | 0.025 | 0.11% | 100.00% | 0 | 1 |
Number of nodes executed: 5
============================== Summary by node type ==============================
[Node type] | [count] | [avg ms] | [avg %] | [cdf %] | [mem KB] | [times called] |
TfLiteNnapiDelegate | 1 | 17.978 | 78.90% | 78.90% | 0 | 1 |
TFLite_Detection_PostProcess | 1 | 3.704 | 16.26% | 95.15% | 0 | 1 |
QUANTIZE | 1 | 0.606 | 2.66% | 97.81% | 0 | 1 |
DEQUANTIZE | 2 | 0.499 | 2.19% | 100.00% | 0 | 2 |
Timings (microseconds): count=50 first=22936 curr=22800 min=22694 max=22972 avg=22789 std=58
Memory (bytes): count=0
5 nodes observed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It seems that Post-Process are executed on not NPU(NNAPI) but CPU core.
If so, I assume this behavior differs from Processor Reference Manual which explains PPU in NPU should execute Post-Process(Non-max Suppression).
Also I can see unexpected processing delay in Post-Process on on IMX8MP EVK now.
So I need countermeasure for this behavior.
Thanks in advance.
Best Regards,
已解决! 转到解答。
Hi, all
Sorry for double post.
This post on the left was judged to be spam, so I posted it twice for testing.