Hello NXP Community,
I’m currently developing an edge AI application focused on real-time video analysis using a microcontroller-based system. The processing pipeline includes a camera input, light object tracking, and decision feedback via wireless communication. Performance and timing are critical, and I’m exploring ways to handle data more efficiently at the edge without always relying on cloud inference.
In reading about high-throughput consumer systems—like the gaming PC setups—I noticed how those machines handle large volumes of data quickly and manage heat and power consumption smartly. While obviously not using that hardware, I’m curious if similar logic applies to embedded architectures. Are there best practices here at NXP for handling bursty data streams, efficient buffering, or minimal-latency responses on devices like i.MX or LPC series?
Also, if anyone has experience using NXP’s ML toolkits or optimizing CMSIS-NN layers on constrained hardware, I’d love to hear your insights. I want to avoid performance bottlenecks at any step—whether it’s frame capture, preprocessing, or ML inference.
Really appreciate the expertise and support of the NXP Community. Looking forward to hearing how others have tackled similar challenges in edge AI or high-speed embedded workloads.
Thanks in advance!
 Zhiming_Liu
		
			Zhiming_Liu
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		Hi,
Regarding AI on i.MX, you can refer to these two documents, especially the second one about AI application demonstrations.
https://www.nxp.com/docs/en/user-guide/UG10166.pdf
https://www.nxp.com/docs/en/user-guide/GPNTUG.pdf
Regarding the performance and throughput optimization you mentioned, I believe you should evaluate it based on the existing demos provided by NXP. For example, on the i.MX8MP with an NPU, performance optimization depends not only on the inference framework at the application layer but also on optimizations driven by the NPU. Generally, customers do not need to concern themselves with these two aspects. Performance bottlenecks can sometimes be resolved through software optimization, but most of the time they are hardware-related limitations.
Best Regards,
Zhiming
