Interesting! Have you experimented with compiler optimization flags? Often, bumping up optimization (e.g., -O3) can significantly speed up code when the NPU isn't active. Also, regarding D-Cache, check the SDK documentation for CACHE_Enable or similar functions, specifically for the MCXN947. Remember optimizing often comes down to a balancing act like in Slope Game, optimizing code versus hardware limitations for top performance. Good luck!