I accidently posted my answer with your requested data not directly as a response to your post, but as a new response in the thread. Hope you still get a notification.
I was also looking into the code and noticed something that seemed odd to me:
In the MultprecMultiply(unsigned w_out[], const unsigned u[], const unsigned v[]) function that is used for transforming the signature into Montgomery Form (ModExp()->RSA_SignatureToPlaintextFast()->MultprecMontPrepareX()->MultprecMultiply()), the numbers u and v are multiplied using casper accelleration and the result stored in w_out, i.e. w_out = u * v.
/* Step 1. Fill w[t-1:0] with 0s, the upper half will be written as we go */
PreZeroW(i, w_out);
/* We do 1st pass NOSUM so we do not have to 0 output */
Accel_SetABCD_Addr(CA_MK_OFF(&v[0]), CA_MK_OFF(u));
Accel_crypto_mul(
Accel_IterOpcodeResaddr(N_wordlen / 2U - 1U, (uint32_t)kCASPER_OpMul6464NoSum, CA_MK_OFF(&w_out[0])));
Accel_done();
/* Step 2. iterate over N words of v using j */
for (j = 2U; j < N_wordlen; j += 2U)
{
/* Step 2b. Check for 0 on v word - skip if so since we 0ed already */
/* Step 3. Iterate over N words of u using i - perform Multiply-accumulate */
if (0U != (GET_WORD(&v[j])) || 0U != (GET_WORD(&v[j + 1U])))
{
Accel_SetABCD_Addr(CA_MK_OFF(&v[j]), CA_MK_OFF(u));
Accel_crypto_mul(
Accel_IterOpcodeResaddr(N_wordlen / 2U - 1U, (uint32_t)kCASPER_OpMul6464Sum, CA_MK_OFF(&w_out[j])));
Accel_done();
}
}
It iterates through all elements of the v[] array using the j iterator variable as stated in step2. According to step3, the same is later with all the elements of the u[] array using the i iterator. Where exactly is this happening? I see the multiply and accumulate of the casper acceleration engine (kCapser_OpMul6464Sum), but I dont see where the iteration through all elements of the 64 u[] array elements is happening?
Might the be the reason for the wrong result?