We're looking to Accelerate the yolo5 model with the NPU on Android (i.mx8m+)
On linux, this is done with the libvx_delegate.so backend.
Example on Linux:
> ./benchmark_model --graph=yolov5n-int8-250.tflite --external_delegate_path=/usr/lib/libvx_delegate.so
</ trim output>
Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=60 first=16580 curr=16497 min=16464 max=16735 avg=16536.7 std=49
On Android, this delegate is not present. However, libovxlib.so is, which is apparently the backend for the Android HAL layer to the NPU for NNapi.
NNapi however, cannot accelerate the yolo5 model.
Example on Android:
> ./benchmark_model --graph=yolo5n-int8-250.tflite --use_nnapi=true
STARTING!
Log parameter values verbosely: [0]
Graph: [yolov5n-int8-250.tflite]
Use NNAPI: [1]
NNAPI accelerators available: [vsi-npu,nnapi-reference]
Loaded model yolov5n-int8-250.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for NNAPI.
NNAPI delegate created.
WARNING: NNAPI SL driver did not implement SL_ANeuralNetworksDiagnostic_registerCallbacks!
VERBOSE: Replacing 273 node(s) with delegate (TfLiteNnapiDelegate) node, yielding 7 partitions.
WARNING: NNAPI SL driver did not implement SL_ANeuralNetworksDiagnostic_registerCallbacks!
WARNING: NNAPI SL driver did not implement SL_ANeuralNetworksDiagnostic_registerCallbacks!
WARNING: NNAPI SL driver did not implement SL_ANeuralNetworksDiagnostic_registerCallbacks!
WARNING: NNAPI SL driver did not implement SL_ANeuralNetworksDiagnostic_registerCallbacks!
Explicitly applied NNAPI delegate, and the model graph will be partially executed by the delegate w/ 4 delegate kernels.
The input model file size (MB): 2.16466
Initialized session in 939.695ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
ERROR: NN API returned error ANEURALNETWORKS_OP_FAILED at line 5140 while running computation.ERROR: Node number 284 (TfLiteNnapiDelegate) failed to invoke.
count=1 curr=1374449Benchmarking failed.
The NPU hardware is capable of processing the model from the Linux tests, so my question is, how can we compile the vx_delegate that works on Linux, or perhaps use the ovxlib directly to bypass the NNapi on Android?
* Tests done with Android 13 2.0.0 on imx8mpevk, Tensorflow lite 2.10.1 benchmark utility
 JosephAtNXP
		
			JosephAtNXP
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		Hi,
Thank you for your interest in NXP Semiconductor products,
The Machine Learning Users Guide states the following
(https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf)
The NNAPI Delegate for Linux platform is deprecated and will be removed in the future. Use VX Delegate instead.
LinuxBSP supports OpenVX (TIM-VX) instead of NNAPI.
But it seems that the android only support NNAPI(NNRT), on below doc.
(https://www.nxp.com/docs/en/user-guide/IMX_ANDROID_TENSORFLOWLITE_USERS_GUIDE.pdf)
Is the only delegate and it's in future plans to use NNAPI in android, VX is not supported but in Linux.
Regards
 JosephAtNXP
		
			JosephAtNXP
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		Hi,
Currently there are no development in android that uses libovx, libovx is exclusively over openvx, which makes impossible the use of libovx over NNAPI for the current and upcoming releases.
As you said, model is well accelerated in linux, that is the best option that you have.
Regards,
 JosephAtNXP
		
			JosephAtNXP
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		Hi,
I don't think there's any work to do on NNAPI, if results do match, you can expect time differences between vx_delegate and NNAPI.
Regards,
Thank you for the reply. This hasn't answered the question though.
1) I realize it isn't supported but is it possible to use vx_delegate on Android? Like if I compile it with the ndk and use it through JNI.
or
2) Can I use the libraries that are on Android libovxlib.so through the ndk directly instead of using the nnapi? (I don't know if this would work or if this is the same as using the nnapi though.)
Using the nnapi doesn't accelerate the model so our product cannot function properly on Android as is.
For context, when we first started this project we were told by support through forums and the ticket system that this was a bug that would be fixed in Android 12. Our customer furthered development with the imx8m+ having trust in NXP that we would be able to have this model be accelerated on Android. Later after waiting over a year and having multiple Android releases with no change, support story changed that it wouldn't be supported. This is a major setback for us and we need to focus on finding a solution even if it isn't supported. We know the hardware is capable of accelerating the model.
