Hello everyone,
I was wondering whether or not it is feasible to deploy text decoders based on attention mechanisms on the MX8 plus.
Does the NPU support those layers and operations?
Is it feasible, alternatively, to deploy it on the cpu?
We are talking about a text decoder of 60M params
Thank you in advance