Hello everyone,
I was wondering whether or not it is feasible to deploy text decoders based on attention mechanisms on the MX8 plus.
Does the NPU support those layers and operations?
Is it feasible, alternatively, to deploy it on the cpu?
We are talking about a text decoder of 60M params
Thank you in advance
Hello,
We verified such model with 1.1 billion parameters on i.MX8MPlus.
For more detail, please refer this WHITEPAPER
https://www.nxp.com/webapp/Download?colCode=GEN-AI-RAG-WHITEPAPER
Best Regards,
Zhiming
Hi Zhiming,
thank you for you answer. I will read the paper.
One question: if you were able to deploy such models on the i.MX 8M Plus, then why here https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf (Chapter 11) I can't seem to find support for MultiHeadAttention layer?
Maybe I'm not looking in the right place?
Thank you in advance for your time.
Kind Regards,
Marco Donnarumma
Hello,
The LLM project and LMM finetune tool with eIQ is not released. NXP will release the eIQ that support deploying LLM model.
Best Regards,
Zhiming
Hello,
do we know if the release will happen in the near future?
Thank you in advance.
Marco
Hello,
The demo demo is expected to be released at the end of 2025Q1, and then later for eIQ, the actual release date depends on the project schedule.
Best Regards,
Zhiming