Which device is supported to run Quantized Gemma Model Inference

ramkumarkoppu_p · ‎04-25-2025

Hi,

Out of i.MX RT700 and i.MX 95 devices, which device has full software support to run inference of Gen AI models like Quantized Google's Gemma Model — first in Python, then in C/C++ using the these devices NPU? Specifically:

Which device NPU support transformer-based architectures, or is it limited to CNNs?
Which inference frameworks are supported for GenAI on this e.IQ platform?

ramkumarkoppu_p · ‎04-25-2025

especially has any of these devices NPU ported by NXP for llama.cpp ?