VoiceSpot FAQ

Showing results for 
Show  only  | Search instead for 
Did you mean: 

VoiceSpot FAQ

VoiceSpot FAQ



VoiceSpot FAQ 


Q: What is VoiceSpot? 

A: VoiceSpot is a high performance wake word engine with very low memory and MHz footprint. It is highly suited for ultra-low power always-on applications where high recognition rates in noisy environments is important. 


Q: Can multiple wake words be supported? 

A: Yes. 


Q: Are custom wake words supported? 

A: Yes, custom wake words can be created as per customer request. A dataset is needed to create a custom wake word, which can either be provided by the customer or collected by NXP at additional cost. 


Q: What is the total memory and MHz footprint? 

A: The exact numbers depend on the configuration and target platform. A typical implementation uses a 33 kB model size requiring a total of 30 - 90 kB RAM and 33 – 65 kB of flash, depending on where the model and code are placed. Depending on the target platform, 5 – 20 MHz is required for processing. 


Q: What is the latency? 

A: The wake word is typically detected within +/- 200 ms of the subjective end of the wake word, which is faster than many other solutions. This helps reduce the latency perceived by the end-user. 


Q: What is the difference between VoiceSpot and VIT? 

A: VoiceSpot relies on a dataset for specifying the wake word whereas VIT uses text input to specify the wake word / commands. This makes VoiceSpot less flexible, but more efficient in terms of resources and with higher performance. VoiceSpot is therefore a great choice when high performance and efficiency is important and a dataset is already available or can be made available with reasonable effort. VIT is a great choice if flexibility is important and the additional platform resources needed is not an issue. 

No ratings
Version history
Last update:
‎04-06-2022 12:04 PM
Updated by: