Hello
While reading the GEN-AI-RAG-WHITEPAPER.pdf provided by NXP, I found that the TinyLlama-1B quantized model was ported to the i.MX95 device using the Neutron Execution Provider within ONNXRT runtime.
I have a few questions regarding this:
- Does ONNXRT refer to onnxruntime?
- I would like to try using the Neutron Execution Provider with ONNXRT as well. Could you provide guidance on how to set up the environment for this?Additionally, how can I apply it in the code?
A detailed explanation would be greatly appreciated. Thank you in advance for your support!