Getting Started with GenAI Flow on i.MX95 – need help and tips

Alicecarry · ‎08-05-2025

Hi everyone,

I’m a developer working with the NXP Generative AI & LLMs forum and I’ve recently started exploring the eIQ GenAI Flow demonstrator on an i.MX95 EVK. I successfully cloned the demo from the AppCodeHub repository and began cross-compiling in a Yocto environment, but I’ve hit a few snags along the way. I understand you need a compatible BSP and the optional meta‑layer depending on revision.

In addition, I’m curious about how artificial intelligence ai inference performance differs between iMX95 and iMX8MP in practice. I’ve seen benchmarks suggesting that token generation speeds are around 9 tokens/sec on iMX95 with Neutron acceleration versus ~8.7 tokens/sec on the iMX8MP CPU-only case @github.com . But I’d love to hear firsthand feedback—especially from developers already using RAG and LLM pipelines on these platforms.

Questions:

Which BSP version (e.g. L6.12.20 vs L6.12.3) should I target for smooth deployment?
Any tips on enabling Neutron successfully on i.MX95 A1 vs B0 revisions?
Experiences with real-time inference latency and memory constraints using RAG under eIQ GenAI Flow?

Appreciate any advice—especially configuration notes or lessons learned—before I press further.

Thanks in advance!

Pierre_M · ‎08-07-2025

Hi @Alicecarry,

Thanks for your message!

Let me try to answer your questions.

For the BSP version to use with GenAI Flow Demonstrator v1.1, we recommend using L6.12.20. On this BSP, the provided meta-layer is no longer necessary for i.MX95 B0, as it already includes built-in LLM acceleration via the Neutron NPU. The meta-layer is only required to customize the L6.12.3 BSP to support LLM acceleration on i.MX95 A1.

With the i.MX95 A1 revision, you have two options to leverage Neutron:

Use the L6.12.3 BSP + meta-layer, or
Use the L6.12.20 BSP, which is simpler, but requires two considerations:

When flashing the image with uuu, use the standard imx-image-full-imx95evk.wic but a specific imx-boot must be used for the A1 revision. This file is included in the BSP release archive and is named:
imx-boot-imx95-a1-19x19-lpddr5-evk-sd.bin-flash_a55
(for the i.MX95 A1 19x19 EVK).
This BSP includes a Neutron firmware that is only compatible with the B0 revision, so LLM acceleration will not work by default. To enable it, replace the firmware with the one provided in the meta-layer: dm-eiq-genai-flow-demonstrator/meta-eiq-genai-flow/recipes-libraries/neutron/files/NeutronFwllm.elf. Copy it to /lib/firmware/ on the i.MX95 EVK file system.

Note that in any case you need to set a "Neutron" dtb in u-boot after the flash.

This demonstrator showcases only a subset of our internal capabilities. Internally, we have additional ASR and LLM models to fine-tune latency and performance for specific use cases. For example, using Whisper-small.en ASR with Danube-500M (q8) LLM, we achieve a "Time to First Audio (TTFA)"—the delay between the end of input speech and the start of output speech—of approximately 3.5 seconds, which is acceptable for many use cases.

With other ASR models, we can reduce this TTFA to below 3 seconds with LLM, and in RAG-only mode (without LLM), we can go below 2 seconds.

Hope this helps!
Thanks again,
Pierre