Issue enabling second core using custom boot (P2020)

edvinl · ‎03-08-2023

Hello,

We have written a custom boot that setups and runs on core 0 on the P2020 processor. Our boot shall enable the second core and simply put it in an infinite loop but we are having problems getting it to run correctly.

Setup

The execution of core 0 is as follows:

Core 0 is setup and configured. A LAW is setup for the complete RAM.
Defines a 4096 byte area at end of RAM (starting at 0x3FFFF000) to use as boot page for CPU 1, and copies the CPU 1 start up code to that area., see code below.
Updates the boot page translation register (Reset_BPTR)

Boot page translation is enabled by setting EN bit to 1
Translation address is set to the boot page start address shifted 12 bits.

Write to register ECM_EEBPCR to set bit CPU1_EN to enable core 1
Currently core 0 then enters an infinite loop doing nothing.

The setup for core 1 performs initialisations in the StartSecondaryCpu assembly function. For example a TLB entry, stack pointer and interrupt registers are setup. See attached code.

Output

The boot was programmed and the boot was executed normally. Thereafter, the programmed boot was debugged with CodeWarrior and the CodeWarrior TAP. Performing an attach it was possible to connect to both cores of the P2020 processor.

We have verified for Core 0 that:

The Reset_BPTR register is set to: 0x8003ffff
Th ECM_EEBPCR register value goes from 0x01000000 to 0x03000000
The instructions for assembly function StartSecondaryCpu was successfully copied to the BootPageAddress

Continuing core 0 then runs into an infinite loop as intended.

When attaching to Core 1 and pausing, we see that it is stuck at instructions at 0xFFFFF114, from cleanup_glue() reent.c35 0x00000000, see attached image. This address 0xFFFFF114 is part of the default boot ROM which we do not want to exectue from. Core 1 should be stuck in an infinite loop as part of the StartSecondaryCpu assembly function but it is not. While core 1 is paused with the debugger the following was observed:

Reading the memory for the core 1 stack pointer showed it had been initialized.
Looking at the TLB entry registers showed that it was setup as intended.
IVPR, IVOR registers updated correctly

Thus seemingly the setup code for core 1, in StartSecondaryCpu assembly function, is being run.

Setup code running on core 0

#define SET_BIT_REG32(bit)               (uint32_t)(1UL << (31UL - (bit)))

#define BPTR_BOOT_PAGE_SHIFT             12U

#define Reset_BPTR_EN_BIT                0U

#define HwMemMap_Ccsr_ECM_EEBPCR         ((uint32_t) 0x000001010UL)

#define ECM_EEBPCR_CPU1_EN_BIT           6U


/* Calculate the boot start page address */
uint32_t BootPageSize    = 4096;
uint32_t BootPageStartAddress = 0x40000000; /* 0x40000000 */
uint32_t BootPageAddress = BootPageStartAddress - BootPageSize;

/* Copy boot. */
memcpy ((void *) BootPageAddress, &StartSecondaryCpu, BootPageSize);

/* Setup Reset_BPTR register
   - The boot page translation enable is set to 1
   - The translation for boot page is set to the BootPageAddress shifted 12
*/

Data = ((uint32_t)BootPageAddress >> BPTR_BOOT_PAGE_SHIFT) | SET_BIT_REG32(Reset_BPTR_EN_BIT);
HwMemMap_WriteCcsrReg (HwMemMap_Ccsr_Reset_BPTR, Data);

/* Start core 1 by setting bit CPU1_EN in ECM_EEBPCR register */
HwMemMap_ReadCcsrReg (HwMemMap_Ccsr_ECM_EEBPCR, &Data);
HwMemMap_WriteCcsrReg (HwMemMap_Ccsr_ECM_EEBPCR, Data | (SET_BIT_REG32(ECM_EEBPCR_CPU1_EN_BIT)) );

Code for core 1

FUNC_START StartSecondaryCpu
.align  12
__secondary_cpu:

        /* Enable MCHK and debug */
        lis r3, 0x0200
        ori r3, r3, 0x1200
        mtmsr r3

        /* Manage L1 Caches */
        li r3, 0x2
        mtspr 0x3F2,r3 /* invalidate d-cache */
        mtspr 0x3F3,r3 /* invalidate i-cache */

        /* Setup TLB */
        lis        r5, 0x1000        /* entry's index */
        ori        r5, r5, 0x0000
        mtspr    MAS0, r5

        lis        r5, 0xC000        /* entry valid and protected */
        ori        r5, r5, 0x0a00    /* size (4^TSIZE KiB) */
        mtspr    MAS1, r5

        lis        r5, 0x0000        /* effective page number */
        ori        r5, r5, 0x0008    /* write-through | cache inhibited | mem coherency | guarded | endianness */
        mtspr    MAS2, r5

        lis        r5, 0x0000        /* real page number */
        ori        r5, r5, 0x003f    /* user permissions */
        mtspr    MAS3, r5

        tlbwe
        msync
        isync

        /* Setup stack */
        lis     r1, _stack_addr_cpu1@ha
        addi    r1, r1, _stack_addr_cpu1@l

        /* Prepare a terminating stack record. */
        stwu    r1, -16(r1)  /* e500 required SP to always be 16-byte aligned */
        li      r0, 0x0000   /* load up r0 with 0x00000000 */
        stw     r0, 0(r1)    /* SysVr4 EABI Supp indicated that initial back chain word should be null */
        li      r0, -1       /* load up r0 with 0xFFFFFFFF */

        stw     r0, 4(r1)    /* Make an illegal return address of 0xFFFFFFFF */

       

        /* Interrupts */
        lis        r3, 0x3E00     /* 0x3E000000 16 MSB */
        addi       r3, r3, 0x0000 /* 0x3E000000 16 LSB */
        mtspr    IVPR, r3

        lis        r4, 0x0000
        lis        r5, 0x0000
        ori        r4, r5, 0x0100
        mtspr   IVOR0, r4

        … /* [repeating] */

        ori        r4, r5, 0x1900
        mtspr  IVOR35, r4

/* Put the second cpu in an infinite loop */
loop_inf:
        b   loop_inf

        /* Fill in the empty space.  The actual reset vector is the last word of the page */
__secondary_cpu_end:
        .space 4092 - (__secondary_cpu_end - __secondary_cpu)
__secondary_reset_vector:
        b   __secondary_cpu

Question

What are we missing in the setup for enabling core 1 and getting it to run correectly?

Thank you for your time!

mariapalmqvist · ‎04-11-2023

Hi

After some debugging we have realized that the second core is now up and running.
We still cannot see what we had expected to see in the debugger, to us it still looks like core 1 is executing at 0xFFFFxxxx and that the code shown by the debugger at those addresses is not the code for our __secondary_spu function as described by my colleague earlier in this conversation.
However, by letting core 1 read-modify-write the value at an address in RAM while executing in the final eternal loop and continuously read that same address from core 0, we can see that the value is incremented.
This enough for us to continue our work. Thank you very much for your help.

元の投稿で解決策を見る

yipingwang · ‎03-16-2023

I escalated your case to the AE team, please refer to the following update from them.

I see customer can attach core1.
Can customer set break point at secondary core entry point "__secondary_cpu:", and stop here?
If core1 can stop at "__secondary_cpu:", can customer debug step by step to check in which line cause core1 stunk?

edvinl · ‎03-17-2023

Hello, thanks for the reply,

We made several attempts to step through the code that shall execute on core 1 but we are running into issues getting debugging information from core 1.

In short the steps taken:

Set break points at the address of where the boot code is moved using the Debugger shell in CodeWarrior with command "bp 0x3ffffffc"
Perform a "Download" of our boot to RAM for core 0 and run to breakpoint right before the ECM_EEBPCR is set that enables core 1.
Perform an "Attach" to core 1.
Continue core 0 to kick core 1

What we see is that when we attach to core 1 it halts on 0xFFFFFFFC. Thus, we are unable to step through the code for "__secondary_cpu" to see where it goes wrong.

Do you have any suggestions on how to debug core 1 in this case just after core 1 has been enabled to run.

Thanks!

edvinl · ‎03-22-2023

Hello,

Do you have any update on how to perform the debugging?

Best regards, Edvin

yipingwang · ‎03-30-2023

Please refer to the following update from the AE team.

I compared your snapshot PNG and your code start from __secondary_cpu, seems they are different.

Do you ask customer double check the boot page code from 0x3ffff0000 is same as the 0xfffff000 from core1 view?

Does customer perform flush cache after copy the bootpage code to 0x3ffff000?

edvinl · ‎03-31-2023

Hello,

Checking memory 0x3FFFF000 we see the code that was copied from __secondary_cpu.

However, checking 0xfffff000 from core 1 view in the debugger show what looks like the on-chip boot and not our __secondary_cpu.

Do we expect them to be the same? As stated in previous message, we are unsure of how to set up the debugger in this case.

As for the caches we have them disabled during boot.

yipingwang · ‎04-03-2023

Please refer to the update from the AE team.

Q: Do we expect them to be the same? As stated in previous message, we are unsure of how to set up the debugger in this case.
Yes. After enabled the boot page translation, the boot page space is mapping to the address that configured in BPTR. So they should be same.

If you not sure you get the correct state via codewarrior, I think you can configure the BPTR and access the boot page space from core0. If you cannot get he expected content in boot page space from core0, I think core1 cannot execute the instruction from the same address.

mariapalmqvist · ‎04-11-2023

Hi

After some debugging we have realized that the second core is now up and running.
We still cannot see what we had expected to see in the debugger, to us it still looks like core 1 is executing at 0xFFFFxxxx and that the code shown by the debugger at those addresses is not the code for our __secondary_spu function as described by my colleague earlier in this conversation.
However, by letting core 1 read-modify-write the value at an address in RAM while executing in the final eternal loop and continuously read that same address from core 0, we can see that the value is incremented.
This enough for us to continue our work. Thank you very much for your help.

Issue enabling second core using custom boot (P2020)

Issue enabling second core using custom boot (P2020)

QorIQ P2 Devices