Hi @Alicecarry,
Thanks for your message!
Let me try to answer your questions.
For the BSP version to use with GenAI Flow Demonstrator v1.1, we recommend using L6.12.20. On this BSP, the provided meta-layer is no longer necessary for i.MX95 B0, as it already includes built-in LLM acceleration via the Neutron NPU. The meta-layer is only required to customize the L6.12.3 BSP to support LLM acceleration on i.MX95 A1.
With the i.MX95 A1 revision, you have two options to leverage Neutron:
- Use the L6.12.3 BSP + meta-layer, or
- Use the L6.12.20 BSP, which is simpler, but requires two considerations:
- When flashing the image with
uuu, use the standard imx-image-full-imx95evk.wic but a specific imx-boot must be used for the A1 revision. This file is included in the BSP release archive and is named:
imx-boot-imx95-a1-19x19-lpddr5-evk-sd.bin-flash_a55
(for the i.MX95 A1 19x19 EVK).
- This BSP includes a Neutron firmware that is only compatible with the B0 revision, so LLM acceleration will not work by default. To enable it, replace the firmware with the one provided in the meta-layer:
dm-eiq-genai-flow-demonstrator/meta-eiq-genai-flow/recipes-libraries/neutron/files/NeutronFwllm.elf. Copy it to /lib/firmware/ on the i.MX95 EVK file system.
Note that in any case you need to set a "Neutron" dtb in u-boot after the flash.
This demonstrator showcases only a subset of our internal capabilities. Internally, we have additional ASR and LLM models to fine-tune latency and performance for specific use cases. For example, using Whisper-small.en ASR with Danube-500M (q8) LLM, we achieve a "Time to First Audio (TTFA)"—the delay between the end of input speech and the start of output speech—of approximately 3.5 seconds, which is acceptable for many use cases.
With other ASR models, we can reduce this TTFA to below 3 seconds with LLM, and in RAG-only mode (without LLM), we can go below 2 seconds.
Hope this helps!
Thanks again,
Pierre