MPC5777C optimizations for dual-core execution in S32DS

ricardofranca · ‎06-11-2024

Hello,

Now that I have a stable dual-core execution environment for my MPC5777CEVB (thanks to your answers in previous posts, by the way!), I am trying to get into finer detail so that I can get the best performance and predictability for this MCU. During my experiments, many questions came into mind:

- Would it be beneficial (in terms of reduced resource contention) to make cpu0 run code from one PFLASH bank and cpu1 run code from another one? In this case, would it be useful to split PFLASH in several TLB entries?

- Same question for SRAM: considering it is split in three banks, is there any optimal way of splitting it into cores? I was thinking about having one TLB entry for a shared 64KB space with write-through cache policy and platform coherency, then 192KB for cpu1 and 256KB for cpu0 but I do not know if there is any gain from binding one core to XBAR slave port 2 and the other to XBAR slave port 4, neither if there is some extra care to take about XBAR slave port 2 as (if I understood it correctly) there is more than one memory bank there.

- Although the EVB comes a MMO3 part number, I intend to use MMO4 chips in actual development. Will the EVB work fine with the MMO4 chip? May I run it at its maximum frequency? Speaking of which, some documentation parts state the maximum frequency as 300MHz and others as 306MHz. Which one is correct?

- Does the maximum EBI CLKOUT remain at 66MHz for MMO4 part numbers? In this case, shall I divide platform clock by 3 and use 50(51?)MHz as CLKOUT?

- I was using PIT0 as a trigger to run code in both cores simultaneously (to estimate the interference one gets form the other). How can I make sure both cores received the interrupt before one of them sets PIT_TFLG0[TIF]?

- When running some code at both cores (both reading from the same Flash addresses, some reads from the same RAM addresses and some reads (and all writes) to different RAM addresses, cpu0 was quicker than cpu1 (~743us vs. ~799us) with round-robin XBAR arbitration policy. When I changed it to priority-based, times remained about the same, but a little slower for both cores (~748us vs. ~805us). In all cases, I am measuring time using each core's own TBL/TBU registers to minimize resource contention. Why was round-robin better for both cores?

- Finally, is there any example or documentation of SDA usage in the S32DS environment?

Thanks,

Ricardo

davidtosenovjan · ‎06-25-2024

Maybe you need to enable it as follows:

https://www.nxp.com.cn/docs/en/release-note/S32SDK_Power_Architecture-RN.pdf

Regarding interrupts:
Interrupts request is triggered in the same cycle for both cores. ISR Handling is however application specific.

View solution in original post

davidtosenovjan · ‎06-12-2024

Answers embedded:

- Would it be beneficial (in terms of reduced resource contention) to make cpu0 run code from one PFLASH bank and cpu1 run code from another one? In this case, would it be useful to split PFLASH in several TLB entries?

There are two dedicated flash ports, they are not split according address, so there is not need to create multiple TLB entries.

- Same question for SRAM: considering it is split in three banks, is there any optimal way of splitting it into cores? I was thinking about having one TLB entry for a shared 64KB space with write-through cache policy and platform coherency, then 192KB for cpu1 and 256KB for cpu0 but I do not know if there is any gain from binding one core to XBAR slave port 2 and the other to XBAR slave port 4, neither if there is some extra care to take about XBAR slave port 2 as (if I understood it correctly) there is more than one memory bank there.

SRAM is split into two halves according address line 18 (see Table 9-2. XBAR slave port assignments). It can be beneficial if one core accesses slave port 2 and second core accesses slave port 4 (or otherwise) as both accesses goes in parallel.

- Although the EVB comes a MMO3 part number, I intend to use MMO4 chips in actual development. Will the EVB work fine with the MMO4 chip? May I run it at its maximum frequency? Speaking of which, some documentation parts state the maximum frequency as 300MHz and others as 306MHz. Which one is correct?

300MHz is maximum not modulated frequency, 306MHz includes frequency modulation.

- Does the maximum EBI CLKOUT remain at 66MHz for MMO4 part numbers? In this case, shall I divide platform clock by 3 and use 50(51?)MHz as CLKOUT?

EBI maximum frequency remains the same, what actually means with 300MHz variant you will have to use lower EBI frequency than with 264MHz version as CLKOUT divider does not offer more options.

- I was using PIT0 as a trigger to run code in both cores simultaneously (to estimate the interference one gets form the other). How can I make sure both cores received the interrupt before one of them sets PIT_TFLG0[TIF]?

I am not sure if I understand this point completely, but INTC is capable to send Interrupt request sent to both cores (INTC_PSRn[PRC_SELn] = 0b).

- When running some code at both cores (both reading from the same Flash addresses, some reads from the same RAM addresses and some reads (and all writes) to different RAM addresses, cpu0 was quicker than cpu1 (~743us vs. ~799us) with round-robin XBAR arbitration policy. When I changed it to priority-based, times remained about the same, but a little slower for both cores (~748us vs. ~805us). In all cases, I am measuring time using each core's own TBL/TBU registers to minimize resource contention. Why was round-robin better for both cores?

You may also pay attention to parking control. Ideally how I already answered is to access different SRAM half of by each core, different flash address by each core (there is internal pre-fetch buffers, but access to same address does not run simultaneously and may cause wait states).

- Finally, is there any example or documentation of SDA usage in the S32DS environment?

What you mean by ‚SDA’ abbreviation?

ricardofranca · ‎06-19-2024

Hi David,

Thanks for all the answers! I will focus on the ones I might have not understood:

- I was using PIT0 as a trigger to run code in both cores simultaneously (to estimate the interference one gets form the other). How can I make sure both cores received the interrupt before one of them sets PIT_TFLG0[TIF]?
I am not sure if I understand this point completely, but INTC is capable to send Interrupt request sent to both cores (INTC_PSRn[PRC_SELn] = 0b).

Indeed, I am sending the interrupt to both cores. When looking at the example provided in the S32DS knowledge base, only one core serviced the interruption and it had to set PIT_TFLG0[TIF] (apparently, to reset the interrupt until PIT0 sets it again). I would like to know if the interrupt is sent to both cores in the same clock cycle, so that if one of them sets PIT_TFLG0[TIF] as soon as the ISR function starts, there is no risk of the other core "not seeing" the interrupt because one of them cleared it.

- Finally, is there any example or documentation of SDA usage in the S32DS environment?
What you mean by ‚SDA’ abbreviation?

My bad! I am referring to the Small Data Areas. While I could make them work in the GHS environment, I cannot understand how gcc handles them... it was creating .sdata and .sbss sections and allocating things there by default, but it did not look like it was doing accesses indexed by r13.

davidtosenovjan · ‎06-25-2024

Maybe you need to enable it as follows:

https://www.nxp.com.cn/docs/en/release-note/S32SDK_Power_Architecture-RN.pdf

Regarding interrupts:
Interrupts request is triggered in the same cycle for both cores. ISR Handling is however application specific.