Well, it only took a couple of months and opening a 2nd support ticket, but we finally resolved the issue we were having with processing yolov5 models.
We were at the proper BSP and TFLite versions as per NXP but still not getting valid results from the NPU.
Last week I received a patch file from NXP support for the vx-delegate op_map.cc file.
Didn't get a chance to apply and test as I was traveling, but tested yesterday and we now get nearly identical results when using the NPU. It was a fairly significant change, and not sure why their Yocto build didn't already have this patch available/applied.
But at least that is resolved.
There's one other issue that others have reported that we see with our C++ test application that doesn't happen with the NXP Python test app so I've asked support about that issue to see what they can suggest.
So, if you have the proper BSP, and TFlite versions and are still not able to get the NPU processing working with yolov5 models try opening a ticket and requesting the patch file for the op_map.cc file for vx-delegate. Hopefully that will fix it.