I think tha thump code is slower to process aritmetic that the dscs.
Most Kinetis executions are single-clock-cycle, but that being said there can be other performance bottlenecks to consider. Do you have some example assembly code you are comparing for absolute performance?
When you say 'single point' do you mean single-precision-floating-point, or fixed point (integer or scaled fractional) math?
Regarding the performance of DSP56800EX and Cortex-M4, I think it is dependent on the task you want to do, if you do FIR, IIR, FFT or the other regular algorithms, of course, the DSP56800EX has higher performance.
For DSP56800EX core, from assembly language, it supports parallel operation, for example, the following instruction can execute in one clock cycle:
MAC Y0,X0,A X:(R0)+,Y X:(R3)+,X0
In one clock cycle, it can compute multiply/addition and save the result to accumulator A, it can read two operands from respective memory to Y and X0 register, and update the address register R0 and R3. It supports modulo addressing mode, you do not need to reinitialize the address register. Furthermore, it has bit reverse function for FFT.
If the algorithms is inregular, I think they has similar performance.
Hope it can help you.
Retrieving data ...