Calculating SHA-256 using MMCAU v2 triggers HardFault

sachin_patel · ‎08-23-2021

Hi,

I'm currently using the Kinetis K32 L2A, which includes a CAU v2 module. I'm trying to use it to calculate a SHA-256 of my main image from my bootloader.

Here is my code:

#include "bootloader_sha256.h"

#include <stdint.h>
#include <stddef.h>

#include "fsl_mmcau.h"

int sha256_calculate(const uint32_t addr, const size_t size, uint32_t sha256sum[static 8])
{
    /* Confirm the SHA-256 hardware exists. */
    if (MMCAU_SHA256_InitializeOutput(sha256sum) != kStatus_Success) {
        return -1;
    }

    /* Block Size is 512. */
    if (MMCAU_SHA256_HashN((const uint8_t *) addr, (size / 64) + ((size % 64) != 0), sha256sum) != kStatus_Success) {
        return -1;
    }
    return 0;
}

On calling MMCAU_SHA256_HashN, the HardFault_Hander IRQ is invoked.

For context, addr = 0x8100, which is 4-byte mod 0 aligned as required by the module, and the variable sha256sum is instantiated in the bootloader, so there should be no overlap.

Any hints would be helpful.

Thanks,

Sachin Patel

sachin_patel · ‎09-03-2021

So I've found a work-around. There is a bug in the library which doesn't permit accessing more than 1 block, even though the function supposedly supports it. What does work though is to just call the MMCAU_SHA256_HashN() function once for each block.

Here is my final working function:

int sha256_calculate(const uint32_t addr, const size_t size, uint32_t *sha256sum)
{
    /* Calculate the length of the SHA-256 input message. */
    size_t sha256_len = size + 9;
    int K = 64 - 9 - (size % 64);
    if (K < 0) {
        K += 64;
    }
    sha256_len += K;
    MMCAU_SHA256_InitializeOutput(sha256sum);
    /* Block Size is 512 bits. */
    for (size_t i = 0; i < sha256_len; i += 64) {
        MMCAU_SHA256_HashN((const uint8_t *) (addr + i), 1, sha256sum);
    }
    return 0;

}

I've confirmed the output result matches the result given by OpenSSL, so I'll accept the additional overhead of re-calling the function a couple of hundred times. Hopefully the compiler optimises most of the overhead out.

在原帖中查看解决方案

sachin_patel · ‎09-03-2021

So I've found a work-around. There is a bug in the library which doesn't permit accessing more than 1 block, even though the function supposedly supports it. What does work though is to just call the MMCAU_SHA256_HashN() function once for each block.

Here is my final working function:

int sha256_calculate(const uint32_t addr, const size_t size, uint32_t *sha256sum)
{
    /* Calculate the length of the SHA-256 input message. */
    size_t sha256_len = size + 9;
    int K = 64 - 9 - (size % 64);
    if (K < 0) {
        K += 64;
    }
    sha256_len += K;
    MMCAU_SHA256_InitializeOutput(sha256sum);
    /* Block Size is 512 bits. */
    for (size_t i = 0; i < sha256_len; i += 64) {
        MMCAU_SHA256_HashN((const uint8_t *) (addr + i), 1, sha256sum);
    }
    return 0;

}

I've confirmed the output result matches the result given by OpenSSL, so I'll accept the additional overhead of re-calling the function a couple of hundred times. Hopefully the compiler optimises most of the overhead out.

sachin_patel · ‎08-28-2021

I've been debugging using the MMCAU API Example Project for the FRDM-K32L2A4S Board in the MCUXpresso Environment on Windows (I typically build standalone using the Embedded GCC Toolchain in Ubuntu Linux for our custom hardware). Running the example as-is seems to be fine using the test string, which is only 1 block large. So instead, I generated a binary file in Ubuntu, and appended the padding as required by FIPS 180-2, and as I'd seem previously the program triggers the HardFault_Handler. The difference is with the MCUXpresso IDE I can determine using breakpoints and the dissassembler where the program is crashing:

Disassembler Snippet:

00000474: mov r9, r5
00000476: mov r10, r6
00000478: mov r11, r7
0000047a: ldr r5, [pc, #600] ; (0x6d4 <mmcau_sha256_hash+20>)
next_blk:
0000047c: ldmia r0!, {r7}
0000047e: rev r7, r7
00000480: str r7, [sp, #0]
00000482: mov r6, r9
00000484: str r7, [r6, #0]
00000486: str r3, [r5, #0]
00000488: ldmia r4!, {r7}
0000048a: mov r6, r10
0000048c: str r7, [r6, #0]
0000048e: str r2, [r5, #0]
00000490: str r1, [r5, #0] <-- Registers measured at this point, next instruction causes program crash
00000492: ldmia r0!, {r7}
00000494: rev r7, r7
00000496: str r7, [sp, #4]
00000498: mov r6, r9
0000049a: str r7, [r6, #0]

Registers:

r0 0x5410 Argument/Scratch Register 1
r1 0xd4000000 Argument/Scratch Register 2
r2 0xa6994327 Argument/Scratch Register 3
r3 0x96594b26 Argument/Scratch Register 4
r4 0x1fff80d4 Variable Register 1
r5 0xf0005000 Variable Register 2
r6 0xf00058c4 Variable Register 3
r7 0x428a2f98 Variable Register 4
r8 0x1 Variable Register 5
r9 0xf0005844 Variable Register 6
r10 0xf00058c4 Variable Register 7
r11 0xf0005884 Variable Register 8
r12 0x20017ef8 Intra-Procedure-Call Scratch Register
sp 0x20017dac Stack Pointer (r13)
lr 0x6b3 <mmcau_sha256_update+34> Link Register (r14)
pc 0x490 <next_blk+20> Program Counter (r15)
xpsr 0x21000000 Program Status Register
msp 0x20017dac Main Stack Pointer
psp 0xffeffffc Process Stack Pointer
control 0x0 Control Register
primask 0x0 Interrupt/Exception Mask Register
cycles 0
Status Registers Status Registers for cortex-m0plus
apsr nzCvq Application Program Status Register
ipsr no fault Interrupt Program Status Register
epsr T Execution Program Status Register

From my understanding, the crash happens on the second loop into "next_blk", i.e. when the data block size is greater than 1.

Any further support would be greatly appreciated.

Thanks,

Sachin Patel

FelipeGarcia · ‎08-31-2021

Hi Sachin,

MMCAU_SHA256_Update should work with multiple block to process as long as they are 512 bits size. How many block are you trying to process? Did you add padding correctly to the message so every block is same 64 bytes size?

Best regards,

Felipe

sachin_patel · ‎09-02-2021

Hi @FelipeGarcia,

Yes, every block is exactly 64 bytes matching the padding as specified by FIPS 180-2.

For my code, I want to process about 200 blocks, and possibly more in the future, but the hard fault happens on block 2.

Thanks,

Sachin Patel

sachin_patel · ‎08-25-2021

After some investigations into the MMCAU example project, it appears that in order for the CAU module to be able to compute an SHA-256 message digest it requires the data to have already been preprocessed by applying the required padding as defined by the SHA-256 specification. However, even after applying this change to my own program, the program still hard faults on calling Hash_N. I’ll investigate further with the example project.

sachin_patel · ‎08-25-2021

Can a representative of NXP please confirm if the module requires the data to be preprocessed before the calculation can occur?

Thanks,

Sachin Patel

FelipeGarcia · ‎08-27-2021

Hi Sachil,

You are correct, hashing requires padding as you can see in AN4307.

Have a great day,

Felipe

-------------------------------------------------------------------------------

Note:

- If this post answers your question, please click the "Mark Correct" button. Thank you!

- We are following threads for 7 weeks after the last post, later replies are ignored. Please open a new thread and refer to the closed one, if you have a related question at a later point in time.

------------------------------------------------------------------------------

sachin_patel · ‎08-28-2021

Hi @FelipeGarcia ,

Is there any proof that the MMCAU SHA-256 algorithm works for packets of block size greater than 1? My current theory is that the CAU fails on the second block calculation. The application note software only provides a test with a block size of 1.

Thanks,

Sachin Patel

bobpaddock · ‎08-23-2021

[static 8]

Looks odd to me.

Where exactly does the 'static 8' get allocated and does it happen on each call to the function running out of memory?

Does

static uint32_t sha256sum[ 8U ];

then

int sha256_calculate(const uint32_t addr, const size_t size, uint32_t *sha256sum )
{

...

}

work?

sachin_patel · ‎08-24-2021

The static 8 syntax is a valid way to ensure that the array being passed to the function has a minimum of 8 members and is syntactically analogous to passing the pointer uint32_t *sha256sum. Using the pointer doesn’t change the output binary.

the problem is not the sha256sum array anyway as the initialise output does not fail - the hash values do get initialised and it confirms the hardware is valid - it’s on computation that the program breaks.

Thanks,

Sachin Patel

Calculating SHA-256 using MMCAU v2 triggers HardFault

Calculating SHA-256 using MMCAU v2 triggers HardFault

Kinetis K Series MCUs