I have wrote an issue on the TIM-VX side, but it may actually concern IMX as well: GitHub issue
To summarize, I am concerned not having any per-ops details from the tflite benchmark app, when using the vx lib delegate. How could I get those informations to check where is the bottleneck? I precise that I get the details when running on the CPU, but the inference times from different model sizes on the CPU and NPU are not proportional, and are not following the same behavior in term of time. So no conclusion could be drawn from it.
Thanks!
Hi @Antoine_B,
Thank you for sharing the link to GitHub. I will review the link.
Based on iMX Machine Learning User's Guide the application can't show the details per ops.
I can suggest you use our eIQ Toolkit to profile your model, for access to eIQ Documentation click on eIQ Portal > Help > eIQ Documentation then select eIQ_Toolkit_UG.pdf and finally go to 4.1 Model Profiling section.
Please let me know if this works.
Best regards, Brian.
Thanks @Brian_Ibarra.
I got familiar with your modelrunner, in the last Linux 6.1.1_1.0.0 version. Indeed, I can see the --profiling option in this tools, however I am encountering some troubles. I am running the tool directly on the board an IMX8MP, with the tflite engine:
Well, it happened that for point 1 and 2, I missread the documentation (check pic attached below), that even the latest versions (I have the 6.1.1_1.0.0) do not have the profiling feature, and that I had to replace the binary with the one provided by the EIQ toolkit. This is now done, and indeed I can access the profiling graph, even with the VX delegate, with the following command line on the target board:
$ modelrunner -d prof -H 10818 -e tflite -c 3
And then adding the device http//<ip>:10818 on the remote laptop within the Model Tool GUI.
However, what about my last point, when the model uses custom tflite operators? So far I haven't seen any options in the executable itself, and no source code on the internet.
Thanks