Does the source address passed to bootaux have to be 8-byte aligned?

xiaokaoy · ‎02-23-2021

Hello

I'm working with imx8dual. I'm trying to start M4 from Uboot, using the bootaux command. I've found that bootaux will only succeed if the source address passed to it is 8-byte aligned.

e.g.

If bootaux 0x90000000, M4 can start to run successfully.

If bootaux 0x90000004, M4 cannot.

Reading the source code of bootaux, we can see that basically, bootaux calls arch_auxiliary_core_up, which, in turn, calls memcpy((void *)aux_core_ram, (void *)addr, size)

In the case of bootaux 0x90000004, however, M4 can also start by any of the following means:

(1) calls memset((void *)aux_core_ram, 0, size) before

memcpy((void *)aux_core_ram, (void *)addr, size)

(2) calls memset((void *)aux_core_ram, 0xff, size)before

memcpy((void *)aux_core_ram, (void *)addr, size)

(3) replace memcpy((void *)aux_core_ram, (void *)addr, size) with

for (int i=0; i<size/4; i++)
*((uint32_t*)aux_core_ram+i) = *((uint32_t*)addr+i);

I must add that
for (int i=0; i<size; i++)
*((uint8_t*)aux_core_ram+i) = *((uint8_t*)addr+i);

doesn't help.

Can anyone explain this?

jimmychan · ‎04-25-2021

Hello,

I got the reply from the AE

----------------

I think the reason is ECC. In iMX8, ECC is enabled in TCM.

When ECC is enabled, user need to do a ECC clean first (write 0 to whole TCML) and then write customized image to TCML area.

----------------

Best regards,

Jimmy

在原帖中查看解决方案

jimmychan · ‎03-04-2021

I will check this for you.

xiaokaoy · ‎03-04-2021

Thanks.

Our tests show that at least 128K must be written into TCM, and that they must be written word by word (a 32 or 64-bit number at a time).

Otherwise the M4 core wouldn’t be able to start running successfully (at least no output that was expected).

memcpy provided by U-boot copies byte by byte unless both the source and destination start address are a multiple of 8

(See https://source.codeaurora.org/external/imx/uboot-imx/tree/lib/string.c?h=imx_v2020.04_5.4.70_2.3.0&i...).

Thus, if the M4 image is not put at a 8-byte aligned address in the DDR memory, the memcpy in arch_auxiliary_core_up

(at https://source.codeaurora.org/external/imx/uboot-imx/tree/arch/arm/mach-imx/imx8/cpu.c?h=imx_v2020.0...)

will copy byte by byte.

In that case, the M4 core won’t be able to start successfully.

jimmychan · ‎03-08-2021

I got the reply :

Have customer changed the load address in M4 app linker script?

In the command "bootaux <addr>", here the <addr> need to be aligned with the entry address defined in M4 app linker scrpt.

xiaokaoy · ‎03-08-2021

Thanks.

I didn't change the load address in M4 app linker script.

The address in "bootaux <addr>" command is an address in the DDR RAM.

bootaux will copy the M4 app bin file from there to M4's TCM before kicking off M4.

What does "aligned with the entry address defined in M4 app linker script" mean?

jimmychan · ‎03-10-2021

The load address of M4 image should be the same with the entry address defined in linker script.

xiaokaoy · ‎03-10-2021

Thanks. But I guess that requirement is for imx7. I'm using imx8dual.

jimmychan · ‎03-11-2021

What does you mean about "But I guess that requirement is for imx7. I'm using imx8dual."?

xiaokaoy · ‎03-11-2021

https://source.codeaurora.org/external/imx/uboot-imx/tree/arch/arm/mach-imx/imx_bootaux.c?h=imx_v202...

I think that this function is for imx7 and that the addr parameter for it must be the start address of M4 TCML from the view of A core.

However, https://source.codeaurora.org/external/imx/uboot-imx/tree/arch/arm/mach-imx/imx8/cpu.c?h=imx_v2020.0...

This function is for imx8qxp, and the boot_private_data parameter (i.e. the addr for bootaux) doesn't have to be the same as what the linker script specifies. Actually they mustn't be the same (see

https://source.codeaurora.org/external/imx/uboot-imx/tree/arch/arm/mach-imx/imx8/cpu.c?h=imx_v2020.0...

jimmychan · ‎03-22-2021

Do you mean that for M4 image on imx8qxp, the entry address defined in linker script can be different from the load address in memory?

On imx8qxp, for example, from CM4 local view, the TCML address is 0x1FFE0000 and from AP view, the TCML address is 0x34FE0000.

So in linker script of M4 image, the entry address is defined as 0x1FFE0000.

But in u-boot, this image will be loaded to 0x34FE0000, which is 0x1FFE0000 from CM4 local view.

xiaokaoy · ‎03-24-2021

Thanks, jimmychan. I knew that.

What bootaux does is copy the M4 image from somewhere in the DDR RAM to 0x34FE0000 and then kick off M4.

I've found that if it copies one 32/64-bit integer after another, it's OK. Otherwise (e.g. copies 8/16-bit integer after another) M4 would be unable to start/run successfully.

I was wondering if you could confirm that it is really required to copy the M4 image to 0x34FE0000 (i.e. the TCM of M4) that way.

jimmychan · ‎04-05-2021

I got the reply from the internal.

About "if it copies one 32/64-bit integer after another, it's OK. Otherwise (e.g. copies 8/16-bit integer after another) M4 would be unable to start/run successfully.", it seems to be related to alignment.

It is said in RM that:

"

Because AHB-Lite does not support write data strobes when accessing AHB-Lite slaves
from an AXI master, care must be taken not to generate transactions that have partial
strobes. Make sure to not have unaligned accessing to TCM from an AXI master. For
example, when writing data to TCM from A53, ensure every write strobe address is 64bit
aligned. When the MMU is enabled, the TCM memory range must have the
MT_DEVICE_NGNRNE type attribute set. This will avoid A53 sparse writes to the
TCM memory region.

"

I don't know if this explanation is related to your findings.

Could you do more tests on it?

The M4 image can also support running in DDR. So it is not required to copy the image to TCM at 0x34FE0000.

xiaokaoy · ‎04-06-2021

I've read the excerpt of the RM before. Thanks, anyway.

I've also found that writing 64/32-bit word by word is not enough; I must write the whole TCML, even if my M4 image is actually far smaller than TCML in size.

Furthermore, if I zero'ed the whole TCML by writing 64/32-bit word by word (whose value is 0) first, then copying the M4 image to TCML byte by byte would also work.

jimmychan · ‎04-25-2021

Hello,

I got the reply from the AE

----------------

I think the reason is ECC. In iMX8, ECC is enabled in TCM.

When ECC is enabled, user need to do a ECC clean first (write 0 to whole TCML) and then write customized image to TCML area.

----------------

Best regards,

Jimmy

xiaokaoy · ‎04-26-2021

Thank you, jimmychan.