Hi, Marcos,
Regarding the performance of DSP56800EX and Cortex-M4, I think it is dependent on the task you want to do, if you do FIR, IIR, FFT or the other regular algorithms, of course, the DSP56800EX has higher performance.
For DSP56800EX core, from assembly language, it supports parallel operation, for example, the following instruction can execute in one clock cycle:
MAC Y0,X0,A X:(R0)+,Y X:(R3)+,X0
In one clock cycle, it can compute multiply/addition and save the result to accumulator A, it can read two operands from respective memory to Y and X0 register, and update the address register R0 and R3. It supports modulo addressing mode, you do not need to reinitialize the address register. Furthermore, it has bit reverse function for FFT.
If the algorithms is inregular, I think they has similar performance.
Hope it can help you.
BR
XiangJun Rong