Q: What is VoiceSpot?
A: VoiceSpot is a high performance wake word engine with very low memory and MHz footprint. It is highly suited for ultra-low power always-on applications where high recognition rates in noisy environments is important.
Q: Can multiple wake words be supported?
A: Yes.
Q: Are custom wake words supported?
A: Yes, custom wake words can be created as per customer request. A dataset is needed to create a custom wake word, which can either be provided by the customer or collected by NXP at additional cost.
Q: What is the total memory and MHz footprint?
A: The exact numbers depend on the configuration and target platform. A typical implementation uses a 33 kB model size requiring a total of 30 - 90 kB RAM and 33 – 65 kB of flash, depending on where the model and code are placed. Depending on the target platform, 5 – 20 MHz is required for processing.
Q: What is the latency?
A: The wake word is typically detected within +/- 200 ms of the subjective end of the wake word, which is faster than many other solutions. This helps reduce the latency perceived by the end-user.
Q: What is the difference between VoiceSpot and VIT?
A: VoiceSpot relies on a dataset for specifying the wake word whereas VIT uses text input to specify the wake word / commands. This makes VoiceSpot less flexible, but more efficient in terms of resources and with higher performance. VoiceSpot is therefore a great choice when high performance and efficiency is important and a dataset is already available or can be made available with reasonable effort. VIT is a great choice if flexibility is important and the additional platform resources needed is not an issue.