Fastest way to iterate through array on M4

Content originally posted in LPCWare by wlamers on Thu Sep 04 02:32:16 MST 2014
I have a time critical ISR that needs to iterate through an array of size 256 (preferably 1024 but 256 is minimum) and check if a value matches the arrays contents. A bool will be set to true is this is the case. MCU is a LPC4357, cortex M4 core, compiler GCC. I already have combined optimisation level 2 (3 is slower) and placing the function in RAM instead on flash. I also use pointer aritmetic and a for loop which does down counting instead of up (checking if i!=0 is faster than checking if i<256). All in all I end up with a duration of 12.5us which has to be reduced drastically to be feasible. This is the (pseudo) code I use now:

uint32_t i;
uint32_t *array_ptr = &theArray[0];
uint32_t compareVal = 0x1234ABCD;
bool validFlag = false;

for (i=256; i!=0; i--)
    if (compareVal == *array_ptr++)
         validFlag = true;

What would be the absolute fastest way to do this? Using inline assembly is allowed. Other 'less elegant' tricks also.