I built a little test app that runs FFT and GRU neural network similar to this keyword spotter and when I run it on the MIMXRT1064-EVK using MCUXpresso bare metal "echo sample" as a starting point, it runs in 2500 microseconds. When I run the exact same code on a Teensy 4.1, using VS Code Platform.IO as a way of compiling and deploying the code it runs on the Teensy 4.1 in 500 microseconds. 5 times faster. The Teensy 4.1 is also using RT1060, not the RT1064. Anyone have any ideas why my MIMXRT1064-EVK code is running so slow ? Does one have to do anything special to make the dev board RT1064 run at full clock speeds or something ? Or is the RAM on the dev board particularly slow or something?
Solved! Go to Solution.
Hi
The chips used in the two cases can run at the same speed so if the code were identical the execution time would logically be the same.
Differences could be:
- clock configuration (Teensy users tend to overclock the processor to get the fastest speed possible, which means running the processor out of specification - fun in hobby projects but a no-go in professional products development)
- code location (code run in QSPI flash (XiP) will be inherently slower than code run in internal RAM)
- cache configuration - depending on the memory layout the cache configuration could cause performance differences
- C library - different C-libraries may have different characteristics and effect overall execution time
- Compiler optimisation setting - a project built with optimisation (especially optimisation for time) will run faster than one built without any optimisation
Therefore you will need to verify that the two instances use the same clock/cache configuration, operate the code from the same memory and also use the same compiler settings so that the execution time can be realistically compared.
Regards
Mark
Thanks Mark, you've given me some things to look into.
Hi
The chips used in the two cases can run at the same speed so if the code were identical the execution time would logically be the same.
Differences could be:
- clock configuration (Teensy users tend to overclock the processor to get the fastest speed possible, which means running the processor out of specification - fun in hobby projects but a no-go in professional products development)
- code location (code run in QSPI flash (XiP) will be inherently slower than code run in internal RAM)
- cache configuration - depending on the memory layout the cache configuration could cause performance differences
- C library - different C-libraries may have different characteristics and effect overall execution time
- Compiler optimisation setting - a project built with optimisation (especially optimisation for time) will run faster than one built without any optimisation
Therefore you will need to verify that the two instances use the same clock/cache configuration, operate the code from the same memory and also use the same compiler settings so that the execution time can be realistically compared.
Regards
Mark