Does the source address passed to bootaux have to be 8-byte aligned?

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 
已解决

Does the source address passed to bootaux have to be 8-byte aligned?

跳至解决方案
4,891 次查看
xiaokaoy
Contributor I

Hello

I'm working with imx8dual. I'm trying to start M4 from Uboot, using the bootaux command. I've found that bootaux will only succeed if the source address passed to it is 8-byte aligned.

e.g.

If bootaux 0x90000000, M4 can start to run successfully.

If bootaux 0x90000004, M4 cannot.

Reading the source code of bootaux, we can see that basically, bootaux calls arch_auxiliary_core_up, which, in turn, calls memcpy((void *)aux_core_ram, (void *)addr, size)

 

In the case of  bootaux 0x90000004, however, M4 can also start by any of the following means:

(1) calls memset((void *)aux_core_ram, 0, size) before

memcpy((void *)aux_core_ram, (void *)addr, size)

(2) calls memset((void *)aux_core_ram, 0xff, size)before

memcpy((void *)aux_core_ram, (void *)addr, size)

(3) replace memcpy((void *)aux_core_ram, (void *)addr, size) with 

for (int i=0; i<size/4; i++)
*((uint32_t*)aux_core_ram+i) = *((uint32_t*)addr+i);

 

I must add that 
for (int i=0; i<size; i++)
*((uint8_t*)aux_core_ram+i) = *((uint8_t*)addr+i);

doesn't help.

 

Can anyone explain this?

 

0 项奖励
回复
1 解答
4,656 次查看
jimmychan
NXP TechSupport
NXP TechSupport

Hello,

 

I got the reply from the AE

----------------

I think the reason is ECC. In iMX8, ECC is enabled in TCM.

When ECC is enabled, user need to do a ECC clean first (write 0 to whole TCML) and then write customized image to TCML area.

----------------

 

Best regards,

Jimmy

在原帖中查看解决方案

0 项奖励
回复
14 回复数
4,862 次查看
jimmychan
NXP TechSupport
NXP TechSupport

I will check this for you.

0 项奖励
回复
4,859 次查看
xiaokaoy
Contributor I

Thanks.

 

Our tests show that at least 128K must be written into TCM, and that they must be written word by word (a 32 or 64-bit number at a time).

Otherwise the M4 core wouldn’t be able to start running successfully (at least no output that was expected).

 

memcpy provided by U-boot copies byte by byte unless both the source and destination start address are a multiple of 8

(See https://source.codeaurora.org/external/imx/uboot-imx/tree/lib/string.c?h=imx_v2020.04_5.4.70_2.3.0&i...).

Thus, if the M4 image is not put at a 8-byte aligned address in the DDR memory, the memcpy in arch_auxiliary_core_up

(at https://source.codeaurora.org/external/imx/uboot-imx/tree/arch/arm/mach-imx/imx8/cpu.c?h=imx_v2020.0...)

will copy byte by byte.

In that case, the M4 core won’t be able to start successfully.

 

0 项奖励
回复
4,850 次查看
jimmychan
NXP TechSupport
NXP TechSupport

I got the reply :

Have customer changed the load address in M4 app linker script?

In the command "bootaux <addr>", here the <addr> need to be aligned with the entry address defined in M4 app linker scrpt.

0 项奖励
回复
4,844 次查看
xiaokaoy
Contributor I

Thanks.

I didn't change the load address in M4 app linker script.

The address in "bootaux <addr>" command is an address in the DDR RAM. 

bootaux will copy the M4 app bin file from there to M4's TCM before kicking off M4.

What does "aligned with the entry address defined in M4 app linker script" mean?

0 项奖励
回复
4,841 次查看
jimmychan
NXP TechSupport
NXP TechSupport

The load address of M4 image should be the same with the entry address defined in linker script.

0 项奖励
回复
4,835 次查看
xiaokaoy
Contributor I

Thanks. But I guess that requirement is for imx7. I'm using imx8dual.

0 项奖励
回复
4,807 次查看
jimmychan
NXP TechSupport
NXP TechSupport

What does you mean about "But I guess that requirement is for imx7. I'm using imx8dual."?

0 项奖励
回复
4,779 次查看
xiaokaoy
Contributor I

https://source.codeaurora.org/external/imx/uboot-imx/tree/arch/arm/mach-imx/imx_bootaux.c?h=imx_v202...

I think that this function is for imx7 and that the addr parameter for it must be the start address of M4 TCML from the view of A core. 

 

However, https://source.codeaurora.org/external/imx/uboot-imx/tree/arch/arm/mach-imx/imx8/cpu.c?h=imx_v2020.0...

This function is for imx8qxp, and the boot_private_data parameter (i.e. the addr for bootaux) doesn't have to be the same as what the linker script specifies. Actually they mustn't be the same (see

https://source.codeaurora.org/external/imx/uboot-imx/tree/arch/arm/mach-imx/imx8/cpu.c?h=imx_v2020.0...

 

 

0 项奖励
回复
4,776 次查看
jimmychan
NXP TechSupport
NXP TechSupport

Do you mean that for M4 image on imx8qxp, the entry address defined in linker script can be different from the load address in memory?

On imx8qxp, for example, from CM4 local view, the TCML address is 0x1FFE0000 and from AP view, the TCML address is 0x34FE0000.

So in linker script of M4 image, the entry address is defined as 0x1FFE0000.

But in u-boot, this image will be loaded to 0x34FE0000, which is 0x1FFE0000 from CM4 local view.

0 项奖励
回复
4,743 次查看
xiaokaoy
Contributor I

Thanks, jimmychan. I knew that.

What bootaux does is copy the M4 image from somewhere in the DDR RAM to 0x34FE0000 and then kick off M4.

I've found that if it copies one 32/64-bit integer after another, it's OK. Otherwise (e.g. copies 8/16-bit integer after another) M4 would be unable to start/run successfully.

I was wondering if you could confirm that it is really required to copy the M4 image to 0x34FE0000 (i.e. the TCM of M4) that way.

0 项奖励
回复
4,723 次查看
jimmychan
NXP TechSupport
NXP TechSupport

I got the reply from the internal.

 

About "if it copies one 32/64-bit integer after another, it's OK. Otherwise (e.g. copies 8/16-bit integer after another) M4 would be unable to start/run successfully.", it seems to be related to alignment.

It is said in RM that:

"

Because AHB-Lite does not support write data strobes when accessing AHB-Lite slaves
from an AXI master, care must be taken not to generate transactions that have partial
strobes. Make sure to not have unaligned accessing to TCM from an AXI master. For
example, when writing data to TCM from A53, ensure every write strobe address is 64bit

aligned. When the MMU is enabled, the TCM memory range must have the
MT_DEVICE_NGNRNE type attribute set. This will avoid A53 sparse writes to the
TCM memory region.

"

  I don't know if this explanation is related to your findings.

  Could you do more tests on it?

 

The M4 image can also support running in DDR. So it is not required to copy the image to TCM at 0x34FE0000.

 

0 项奖励
回复
4,694 次查看
xiaokaoy
Contributor I

I've read the excerpt of the RM before. Thanks, anyway.

I've also found that writing 64/32-bit word by word is not enough; I must write the whole TCML, even if my M4 image is actually far smaller than TCML in size.

Furthermore, if I zero'ed the whole TCML by writing 64/32-bit word by word (whose value is 0) first, then copying the M4 image to TCML byte by byte would also work.

0 项奖励
回复
4,657 次查看
jimmychan
NXP TechSupport
NXP TechSupport

Hello,

 

I got the reply from the AE

----------------

I think the reason is ECC. In iMX8, ECC is enabled in TCM.

When ECC is enabled, user need to do a ECC clean first (write 0 to whole TCML) and then write customized image to TCML area.

----------------

 

Best regards,

Jimmy

0 项奖励
回复
4,637 次查看
xiaokaoy
Contributor I

Thank you, jimmychan.

0 项奖励
回复