Hi Maxime,
Your right, I misread your code. It looked so much like the exponential-averager that my brain told me it was. In reality, it was a two-tap FIR filter. Now I understand why the coefficients were .5
Your last question first: What is a pipeline stall?
Typically, the DSP starts an new instruction every cycle, but the execution is pipelined, so it typically takes 7 cycles to complete. In effect, 7 instructions are at different stages of execution at any point in time. The actual multiply in the ALU takes 2 cycles, so the result cannot be read-out of the accumulator until one cycle after the multiply is executed. I use a NOP for that one cycle delay, but if the NOP is not there, the DSP is supposed to insert a "pipeline-stall" auto-magically. However, my assembler flags an error when it detects a pipeline-stall, so I need to explicitly delay that one cycle.
As for the MAC instruction:
I see that you are expanding your FIR filter to three taps, and you are heading in the correct direction. But if you should expand to, say, 100 taps, you can see how the code can get tedious.
The "repeat" instruction, coupled with the MAC instruction and the Address Generation Unit (AGU) allows you to build a FIR filter of any size with just a few instructions. Here is a sample:
;
; Set N to the number of taps you would like to have.
;
N: equ 32 ;for a 32 tap FIR filter
;
; This code only needs to be executed once to initialize the AGU registers.
;
move #CoefficientTable,R0 ;FIR filter coefficient table in x-memory
move #SampleTable,R4 ;samples-table to be filtered in y-memory
move #N-1,M4 ;set modulus register for N taps
move M4,M0 ;both modulus registers are the same
.
.
.
;
; This code gets executed for each new sample.
;
movep y:input,y:(R4) ;put sample in table over oldest sample
clr A x:(R0)+,X0 y:(R4)-,Y0 ;get 1st sample and coefficent
rep #N-1 ;do 'mac' for all taps except the last tap
mac X0,Y0,A x:(r0)+,X0 y:(r4)-,Y0 ;get next sample and coefficient
macr X0,Y0,A (r4)+ ;mac final tap, round, inc sample-address
movep a,y:output ;ship the filtered value to the outside
;
Notice that the address registers take care of themselves, wrapping around from end to beginning when they need to. Notice also that the 'mac' instruction inside the repeat-loop not only multiplies and adds, but also fetches the next sample and coefficient for the following iteration. That means 1 cycle per tap. In the DSP that I'm using, it means I can execute a tap in 5 nanoseconds, or a 100 tap FIR filter in half a microsecond.