I don't understand how you can conclude that this application demands more processing power that can be found in a RT1060 and that an "MPU" is needed ?
It all depends on how often one would need to classify an object, I can easily run TensorFlow Lite on a LPC54628 and do classification, I can't do 100 image classifications pr second, as well as you can't run 1000 classifications on a "MPU".
It all comes down to requirements.
You can even do this stuff on an old ARM7 LPC2458 running 72 MHz if the speed is enough for your application.
So before answering such a question, I think it is important to ask a few questions the other way....